r语言 - 根据单词模式创建新变量



我想创建一个新变量unsure,其中包含单词" unsure"如果在freetext列中发现以下任何单词:&;too soon&;; &;to tell&;,保持freetext不变,当freetext不包含这些单词时,在新列中发现NA。当前数据如下:

id               freetext date
1   1           its too soon    1
2   2           I'm not sure    2
3   3                   pink   12
4   4                 yellow   15
5   5       too soon to tell   20
6   6 I think it is too soon    2
7   7                 5 days    6
8   8                    red    7
9   9        its been 2 days    3
10 10       too soon to tell   11

数据:

structure(list(id = c("1","2","3","4","5","6","7","8","9","10"), 
freetext = c("its too soon", "I'm not sure",
"pink","yellow","too soon to tell","I think it is too soon","5 days","red",
"its been 2 days","too soon to tell","scans","went on holiday"), 
date = c("1","2","12","15","20","2","6","7","3","11")), class = "data.frame", row.names = c(NA,-10L))

我希望它看起来像:

id               freetext unsure date
1   1           its too soon unsure    1
2   2           I'm not sure   <NA>    2
3   3                   pink   <NA>   12
4   4                 yellow   <NA>   15
5   5       too soon to tell unsure   20
6   6 I think it is too soon unsure    2
7   7                 5 days   <NA>    6
8   8                    red   <NA>    7
9   9        its been 2 days   <NA>    3
10 10       too soon to tell unsure   11

您可以使用if_elsestr_detect进行模式匹配-

library(tidyverse)
df %>% mutate(unsure = if_else(str_detect(freetext, 'too soon|to tell'), 'unsure', NA_character_))
#   id               freetext date unsure
#1   1           its too soon    1 unsure
#2   2           I'm not sure    2   <NA>
#3   3                   pink   12   <NA>
#4   4                 yellow   15   <NA>
#5   5       too soon to tell   20 unsure
#6   6 I think it is too soon    2 unsure
#7   7                 5 days    6   <NA>
#8   8                    red    7   <NA>
#9   9        its been 2 days    3   <NA>
#10 10       too soon to tell   11 unsure

In base R -

transform(df, unsure = ifelse(grepl('too soon|to tell', freetext), 'unsure', NA))

最新更新