使用 R 清除名称中的标题



想要从名称中删除标题第一列输出将像Clean_name列一样。任何建议。

> df
           NAMEFIRST     Clean_name
1         BHASOTI MS        BHASOTI
2          BHABESHMR        BHABESH
3             RINAMS           RINA
4        SUSHMITAMRS       SUSHMITA
5         ARKADIY MR        ARKADIY
6  PRAMOD TRIMBAK DR PRAMOD TRIMBAK
7          ANDREW MR         ANDREW
8      MICHELLE MISS       MICHELLE
9         DINESHA MR        DINESHA
10        SREEDHARMR       SREEDHAR
11        PANKAJMSTR         PANKAJ
12   SUSHIL KUMAR MR   SUSHIL KUMAR
13          FAZLURMR         FAZLUR
df <- data.frame(name = c("RAMOREYDR","SAMUEL MR","MR KOOL","HANDSOMEDR","GELLER DR","SONIA MS"))
df
#         name
# 1  RAMOREYDR
# 2  SAMUEL MR
# 3    MR KOOL
# 4 HANDSOMEDR
# 5  GELLER DR
# 6   SONIA MS
df$Clean_Name <- gsub(" MR|MR|MR | MS|MS|MS | DR|DR|DR ", "", df$name)
df
#         name Clean_Name
# 1  RAMOREYDR    RAMOREY
# 2  SAMUEL MR     SAMUEL
# 3    MR KOOL       KOOL
# 4 HANDSOMEDR   HANDSOME
# 5  GELLER DR     GELLER
# 6   SONIA MS      SONIA

您没有提供任何可用的数据。可以这样解决:

column <- c("MICHELLE MISS","PRAMOD TRIMBAK DR")
sub("(\s*(MR|DR|MISS|MS|MSTR|RS))$","",column)

输出:

 "MICHELLE"       "PRAMOD TRIMBAK"

这个正则表达式可以解决问题:

df
                name     Clean_name
1         BHASOTI MS        BHASOTI
2          BHABESHMR        BHABESH
3             RINAMS           RINA
4        SUSHMITAMRS       SUSHMITA
5         ARKADIY MR        ARKADIY
6  PRAMOD TRIMBAK DR PRAMOD TRIMBAK
7          ANDREW MR         ANDREW
8      MICHELLE MISS       MICHELLE
9         DINESHA MR        DINESHA
10        SREEDHARMR       SREEDHAR
11        PANKAJMSTR         PANKAJ
12   SUSHIL KUMAR MR   SUSHIL KUMAR
13          FAZLURMR         FAZLUR
df$name_cleaned <- gsub(" *(MS)|(MR)|(DR)|(MRS)|(MISS)|(MSTR)$", "", df$name)
df
                name     Clean_name    name_cleaned
1         BHASOTI MS        BHASOTI         BHASOTI
2          BHABESHMR        BHABESH         BHABESH
3             RINAMS           RINA            RINA
4        SUSHMITAMRS       SUSHMITA        SUSHMITA
5         ARKADIY MR        ARKADIY        ARKADIY 
6  PRAMOD TRIMBAK DR PRAMOD TRIMBAK PRAMOD TRIMBAK 
7          ANDREW MR         ANDREW           ANEW 
8      MICHELLE MISS       MICHELLE       MICHELLE 
9         DINESHA MR        DINESHA        DINESHA 
10        SREEDHARMR       SREEDHAR        SREEDHAR
11        PANKAJMSTR         PANKAJ          PANKAJ
12   SUSHIL KUMAR MR   SUSHIL KUMAR   SUSHIL KUMAR 
13          FAZLURMR         FAZLUR          FAZLUR

您可以通过用|分隔来添加更多要删除的元素

最新更新