r语言 - 识别组数据的匹配字符串,并创建指定是否存在更改的新列



假设我有以下数据集:

dat<- data.frame(ID= c("A","A","A","A","A","A","B","B", "B", "B"), 
test= rep(c("pre","post"),5),
item= c(rep("item1",2), rep("item2",2), rep("item3", 2), rep("item1",2), rep("item2",2)),
answer= c("science","science","science","","", "science", "some multi word string that is not science", "history", "", "social science"))

我想为IDitem的每一组确定answer中字符串的一个特定元素。我需要识别science的实例,例如,不包括像social science这样的条目/字符串。虽然social science包含单词science,但我只对science本身的实例感兴趣。

新建列change_type

  • 水平both表明在test的两个水平中是否存在科学,
  • pre表示science只存在于test等于pre的水平
  • post表示science只存在于test等于post的水平。

输出如下:

res<- data.frame(ID= c("A","A","A","B","B"), 
item= c("item1","item2","item3","item1","item2"),
change_type=c("both","pre", "post", "NA", "NA"))

我们可以用case_when:

library(dplyr)
dat %>% 
group_by(ID, item) %>% 
mutate(change_type = case_when(first(answer)=="science" & 
last(answer)=="science"    ~ "both",
first(answer)=="science" & first(test) == "pre" ~ "pre",
last(answer) == "science" & last(test) == "post" ~ "post"
)) %>% 
group_by(ID, item,change_type) %>% 
summarise()
ID    item  change_type
<chr> <chr> <chr>      
1 A     item1 both       
2 A     item2 pre        
3 A     item3 post       
4 B     item1 NA         
5 B     item2 NA  

最新更新