R:从其他列中切片具有最小日期和附加条件的行



我有以下数据帧:

df =
id date           medication related_medication
1 2017-02-18      A          yes
1 2017-02-07      D          yes
2 2017-02-18      S          yes
2 2017-02-18      F          no
3 2017-02-18      A          yes
3 2017-02-01      F          yes

我只想取每个id出现相关药物的最短日期。在上面的例子中,我们只有个人1和3,其中有2个相关药物(sinc此变量取值为yes(。出于这个原因,我很想采取最短的日期,当出现。生成的表格应如下所示:

df =
id date           medication related_medication
1 2017-02-07      D          yes
2 2017-02-18      S          yes
2 2017-02-18      F          no
3 2017-02-01      F          yes

到目前为止,我已经尝试过:

df_final <- df %>%
slice(which.min(date))

但我没有找到只有在满足特定条件时才能执行此操作的方法,即related_medication == "yes"

您可以使用slice_min

library(dplyr)
df %>% 
group_by(id, related_medication) %>% 
slice_min(date)

输出

id date       medication related_medication        
1     1 2017-02-07 D          yes               
2     2 2017-02-18 F          no                
3     2 2017-02-18 S          yes               
4     3 2017-02-01 F          yes               

如果要保留所有观测,如果related_medication == "no"

df %>% 
group_by(id) %>% 
filter(date[related_medication == "yes"] == min(date[related_medication == "yes"]) |
related_medication == "no")

使用data.table

library(data.table)
setDT(df)[, .SD[which.min(date)], .(id, related_medication)]

-输出

id related_medication       date medication
<int>             <char>     <Date>     <char>
1:     1                yes 2017-02-07          D
2:     2                yes 2017-02-18          S
3:     2                 no 2017-02-18          F
4:     3                yes 2017-02-01          F

最新更新