我有一些数据如下
Team Date Text
1 13/12/01 mytext1
2 12/01/13 mytext2
2 05/02/13 mytext3
3 14/02/14 mytext4
3 16/02/14 mytext5
3 01/03/14 mytext6
4 29/03/12 mytext7
我想合并团队中两行之间的Date列之差为<29,应导致:
Team Date Text
1 13/12/01 mytext1
2,2 12/01/13,05/02/13 mytext2,mytext3
3,3,3 14/02/14,16/02/14,16/02/14 mytext4,mytext5,mytext6
4 29/03/12 mytext7
我纠结于如何根据这个条件进行合并。到目前为止,我有:
df%>%
mutate(diff_days = difftime(Date , lag(Date ),units="days"))%>%
group_by(Team,diff_days)%>%
summarize_all(paste,collapse=", ")
但是如何实现合并行的条件呢?
在每个Team
中,只要两个连续日期之间的差异大于29,就可以创建一个新组。对于每组,将Date
和Text
列折叠为一个逗号分隔的字符串。
library(dplyr)
df %>%
mutate(Date = lubridate::dmy(Date)) %>%
group_by(Team) %>%
group_by(grp = cumsum(Date -
lag(Date, default = first(Date)) > 29), .add = TRUE) %>%
summarise(across(c(Date, Text), toString)) %>%
select(-grp) %>%
ungroup
# Team Date Text
# <int> <chr> <chr>
#1 1 2001-12-13 mytext1
#2 2 2013-01-12, 2013-02-05 mytext2, mytext3
#3 3 2014-02-14, 2014-02-16, 2014-03-01 mytext4, mytext5, mytext6
#4 4 2012-03-29 mytext7
我们也可以使用
library(dplyr)
df %>%
mutate(Date = as.Date(Date, '%d/%m/%y')) %>%
group_by(Team) %>%
group_by(grp = cumsum(Date -
lag(Date, default = first(Date)) > 29), .add = TRUE) %>%
summarise(across(c(Date, Text), toString), .groups = 'drop') %>%
select(-grp)