我想创建一个新变量unsure
,其中包含单词" unsure"如果在freetext
列中发现以下任何单词:&;too soon&;; &;to tell&;,保持freetext
不变,当freetext
不包含这些单词时,在新列中发现NA
。当前数据如下:
id freetext date
1 1 its too soon 1
2 2 I'm not sure 2
3 3 pink 12
4 4 yellow 15
5 5 too soon to tell 20
6 6 I think it is too soon 2
7 7 5 days 6
8 8 red 7
9 9 its been 2 days 3
10 10 too soon to tell 11
数据:
structure(list(id = c("1","2","3","4","5","6","7","8","9","10"),
freetext = c("its too soon", "I'm not sure",
"pink","yellow","too soon to tell","I think it is too soon","5 days","red",
"its been 2 days","too soon to tell","scans","went on holiday"),
date = c("1","2","12","15","20","2","6","7","3","11")), class = "data.frame", row.names = c(NA,-10L))
我希望它看起来像:
id freetext unsure date
1 1 its too soon unsure 1
2 2 I'm not sure <NA> 2
3 3 pink <NA> 12
4 4 yellow <NA> 15
5 5 too soon to tell unsure 20
6 6 I think it is too soon unsure 2
7 7 5 days <NA> 6
8 8 red <NA> 7
9 9 its been 2 days <NA> 3
10 10 too soon to tell unsure 11
您可以使用if_else
和str_detect
进行模式匹配-
library(tidyverse)
df %>% mutate(unsure = if_else(str_detect(freetext, 'too soon|to tell'), 'unsure', NA_character_))
# id freetext date unsure
#1 1 its too soon 1 unsure
#2 2 I'm not sure 2 <NA>
#3 3 pink 12 <NA>
#4 4 yellow 15 <NA>
#5 5 too soon to tell 20 unsure
#6 6 I think it is too soon 2 unsure
#7 7 5 days 6 <NA>
#8 8 red 7 <NA>
#9 9 its been 2 days 3 <NA>
#10 10 too soon to tell 11 unsure
In base R -
transform(df, unsure = ifelse(grepl('too soon|to tell', freetext), 'unsure', NA))