我有以下数据帧:
df =
id date medication related_medication
1 2017-02-18 A yes
1 2017-02-07 D yes
2 2017-02-18 S yes
2 2017-02-18 F no
3 2017-02-18 A yes
3 2017-02-01 F yes
我只想取每个id出现相关药物的最短日期。在上面的例子中,我们只有个人1和3,其中有2个相关药物(sinc此变量取值为yes(。出于这个原因,我很想采取最短的日期,当出现。生成的表格应如下所示:
df =
id date medication related_medication
1 2017-02-07 D yes
2 2017-02-18 S yes
2 2017-02-18 F no
3 2017-02-01 F yes
到目前为止,我已经尝试过:
df_final <- df %>%
slice(which.min(date))
但我没有找到只有在满足特定条件时才能执行此操作的方法,即related_medication == "yes"
您可以使用slice_min
library(dplyr)
df %>%
group_by(id, related_medication) %>%
slice_min(date)
输出
id date medication related_medication
1 1 2017-02-07 D yes
2 2 2017-02-18 F no
3 2 2017-02-18 S yes
4 3 2017-02-01 F yes
如果要保留所有观测,如果related_medication == "no"
。
df %>%
group_by(id) %>%
filter(date[related_medication == "yes"] == min(date[related_medication == "yes"]) |
related_medication == "no")
使用data.table
library(data.table)
setDT(df)[, .SD[which.min(date)], .(id, related_medication)]
-输出
id related_medication date medication
<int> <char> <Date> <char>
1: 1 yes 2017-02-07 D
2: 2 yes 2017-02-18 S
3: 2 no 2017-02-18 F
4: 3 yes 2017-02-01 F