R:从其他列中切片具有最小日期和附加条件的行

我有以下数据帧：

df =
id date           medication related_medication
1 2017-02-18      A          yes
1 2017-02-07      D          yes
2 2017-02-18      S          yes
2 2017-02-18      F          no
3 2017-02-18      A          yes
3 2017-02-01      F          yes

我只想取每个id出现相关药物的最短日期。在上面的例子中，我们只有个人1和3，其中有2个相关药物(sinc此变量取值为yes(。出于这个原因，我很想采取最短的日期，当出现。生成的表格应如下所示：

df =
id date           medication related_medication
1 2017-02-07      D          yes
2 2017-02-18      S          yes
2 2017-02-18      F          no
3 2017-02-01      F          yes

到目前为止，我已经尝试过：

df_final <- df %>%
slice(which.min(date))

但我没有找到只有在满足特定条件时才能执行此操作的方法，即related_medication == "yes"

您可以使用slice_min

library(dplyr)
df %>% 
group_by(id, related_medication) %>% 
slice_min(date)

输出

id date       medication related_medication        
1     1 2017-02-07 D          yes               
2     2 2017-02-18 F          no                
3     2 2017-02-18 S          yes               
4     3 2017-02-01 F          yes

如果要保留所有观测，如果related_medication == "no"。

df %>% 
group_by(id) %>% 
filter(date[related_medication == "yes"] == min(date[related_medication == "yes"]) |
related_medication == "no")

使用data.table

library(data.table)
setDT(df)[, .SD[which.min(date)], .(id, related_medication)]

-输出

id related_medication       date medication
<int>             <char>     <Date>     <char>
1:     1                yes 2017-02-07          D
2:     2                yes 2017-02-18          S
3:     2                 no 2017-02-18          F
4:     3                yes 2017-02-01          F

相关内容

最新更新

热门标签：