r-stringr包使用str_detect-搜索一个单词并排除单词



我有一个示例项目,需要使用stringr包搜索字符串。在这个例子中,为了消除其他大小写拼写,我从str_to_lower(example$remarks)开始,它使注释全部小写。备注栏描述了住宅物业。

我需要搜索单词";商店";。然而,单词";购物;也在备注栏中,我不想要那个词。

一些观察:a)只有一个词"商店";;b) 只有一个字"购物;;c) 既没有";商店";或";购物;;d) 同时使用";商店""购物";。

当使用str_detect()时,我希望它给我一个TRUE,用于检测单词";商店";,但我不希望它给我一个CCD_ 5用于检测字符串"0";商店";在单词";购物";。当前,如果我运行str_detect(example$remarks, "shop"),我得到两个词"0"的TRUE;商店";以及";购物";。实际上,我只想要4个字符的字符串"的TRUE;商店";并且如果字符";商店";出现,但后面有任何其他字符,如shop(ping),我希望代码排除检测到它而不将其标识为TRUE

此外,如果备注同时包含单词";商店";以及";购物";,我希望结果是CCD_ 10;商店";但不是";购物";。

最终,我希望使用str_detect()的一行代码可以给我以下结果:

  1. 如果备注观察只有单词";商店"=TRUE
  2. 如果评论观察只有单词";购物FALSE
  3. 如果评论观察既没有单词";商店";或";购物FALSE
  4. 如果评论观察同时具有单词";商店";"与";购物用于仅检测4个字符串的CCD_ 15;商店";并且由于字"0"而不输出CCD_ 16;购物">

我需要所有的观察结果都保留在数据集中,并且不能排除它们,因为我需要创建一个新的列,我已经将其标记为shop_YN,该列给出";是";对于仅具有4个字符串"0"的观察;商店";。一旦我有了正确的str_detect()代码,我计划将结果包装在mutate()if_else()函数中,如下所示(除了我不知道在str_detect()中使用什么代码来获得我需要的结果):

shop_YN <- example %>% mutate(shop_YN = if_else(str_detect(example$remarks, ), "Yes", "No"))

以下是使用dput():的数据示例

structure(list(price = c(195000, 213000, 215000, 240000, 241000, 
250000, 255000, 256500, 260000, 263500, 265000, 277000, 280000, 
280000, 150000), remarks = c("large home with a 1200 sf shop. great location close to shopping.", 
"updated home close to shopping & schools.", "nice location. 2br home with updating.", 
"huge shop on property!", "close to shopping.", "updated, clean, great location, garage.", 
"close to shopping and massive shop on property.", "updated home near shopping, schools, restaurants.", 
"large home with updated interior.", "close to schools, updated, stick-built shop 1500sf.", 
"home and shop.", "near schools, shopping, restaurants. partially updated home.", 
"located close to shopping. high quality home with shop in backyard.", 
"brick 2-story. lots of shopping near by. detached garage and large shop in backyard.", 
"fixer! needs work.")), row.names = c(NA, -15L), class = c("tbl_df", 
                                           "tbl", "data.frame"))

您可能正在此处查找单词边界(\b)。在两个单词边界之间包裹所需的模式,只匹配单词,而不匹配较长单词的部分。

library(dplyr)
library(sitrngr)
df %>% mutate(shop_YN = str_detect(remarks, '\bshop\b'))
# A tibble: 15 × 3
price remarks                                                                          shop_YN
<dbl> <chr>                                                                            <lgl>  
1 195000 large home with a 1200 sf shop. great location close to shopping.                TRUE   
2 213000 updated home close to shopping & schools.                                        FALSE  
3 215000 nice location. 2br home with updating.                                           FALSE  
4 240000 huge shop on property!                                                           TRUE   
5 241000 close to shopping.                                                               FALSE  
6 250000 updated, clean, great location, garage.                                          FALSE  
7 255000 close to shopping and massive shop on property.                                  TRUE   
8 256500 updated home near shopping, schools, restaurants.                                FALSE  
9 260000 large home with updated interior.                                                FALSE  
10 263500 close to schools, updated, stick-built shop 1500sf.                              TRUE   
11 265000 home and shop.                                                                   TRUE   
12 277000 near schools, shopping, restaurants. partially updated home.                     FALSE  
13 280000 located close to shopping. high quality home with shop in backyard.              TRUE   
14 280000 brick 2-story. lots of shopping near by. detached garage and large shop in back… TRUE   
15 150000 fixer! needs work.                                                               FALSE

如果您想要YesNo而不是逻辑shop_YN,只需将str_detect的输出管道传输到ifelse:

df %>% mutate(shop_YN = str_detect(remarks, '\bshop\b') %>% ifelse('Yes', 'No'))

我们也可以使用grepl而不是str_detect:

df %>% 
mutate(check = grepl("\bshop\b", remarks))
price remarks                                                                              check
<dbl> <chr>                                                                                <lgl>
1 195000 large home with a 1200 sf shop. great location close to shopping.                    TRUE 
2 213000 updated home close to shopping & schools.                                            FALSE
3 215000 nice location. 2br home with updating.                                               FALSE
4 240000 huge shop on property!                                                               TRUE 
5 241000 close to shopping.                                                                   FALSE
6 250000 updated, clean, great location, garage.                                              FALSE
7 255000 close to shopping and massive shop on property.                                      TRUE 
8 256500 updated home near shopping, schools, restaurants.                                    FALSE
9 260000 large home with updated interior.                                                    FALSE
10 263500 close to schools, updated, stick-built shop 1500sf.                                  TRUE 
11 265000 home and shop.                                                                       TRUE 
12 277000 near schools, shopping, restaurants. partially updated home.                         FALSE
13 280000 located close to shopping. high quality home with shop in backyard.                  TRUE 
14 280000 brick 2-story. lots of shopping near by. detached garage and large shop in backyard. TRUE 
15 150000 fixer! needs work.                                                                   FALSE

相关内容

  • 没有找到相关文章

最新更新