假设我有以下数据集:
dat<- data.frame(ID= c("A","A","A","A","A","A","B","B", "B", "B"),
test= rep(c("pre","post"),5),
item= c(rep("item1",2), rep("item2",2), rep("item3", 2), rep("item1",2), rep("item2",2)),
answer= c("science","science","science","","", "science", "some multi word string that is not science", "history", "", "social science"))
我想为ID
和item
的每一组确定answer
中字符串的一个特定元素。我需要识别science
的实例,例如,不包括像social science
这样的条目/字符串。虽然social science
包含单词science
,但我只对science
本身的实例感兴趣。
新建列change_type
- 水平
both
表明在test
的两个水平中是否存在科学, pre
表示science
只存在于test
等于pre
的水平post
表示science
只存在于test
等于post
的水平。
输出如下:
res<- data.frame(ID= c("A","A","A","B","B"),
item= c("item1","item2","item3","item1","item2"),
change_type=c("both","pre", "post", "NA", "NA"))
我们可以用case_when
:
library(dplyr)
dat %>%
group_by(ID, item) %>%
mutate(change_type = case_when(first(answer)=="science" &
last(answer)=="science" ~ "both",
first(answer)=="science" & first(test) == "pre" ~ "pre",
last(answer) == "science" & last(test) == "post" ~ "post"
)) %>%
group_by(ID, item,change_type) %>%
summarise()
ID item change_type
<chr> <chr> <chr>
1 A item1 both
2 A item2 pre
3 A item3 post
4 B item1 NA
5 B item2 NA