这种情况的措辞对我来说有点棘手,所以如果它是重复的,我道歉。我根据我对所需输出的最佳理解寻找解决方案
假设我有这样的数据帧:
df <- data.frame(ID = c(555, 555, 555, 555, 555, 555, 555, 555, 555, 555, 555, 555),
A = c('2020-04-13 19:10:00', '2020-04-14 20:00:00', '2020-04-13 19:10:00', '2020-04-14 20:00:00',
'2020-04-22 08:13:00', '2020-04-23 19:00:00', '2020-04-13 19:10:00', '2020-04-14 20:00:00',
'2020-04-22 08:13:00', '2020-04-23 19:00:00', '2020-04-27 13:41:10', '2020-05-01 22:01:00'),
B = c('2020-04-15 12:00:00', '2020-04-15 12:00:00', '2020-04-24 11:00:00', '2020-04-24 11:00:00',
'2020-04-24 11:00:00', '2020-04-24 11:00:00', '2020-05-07 10:30:00', '2020-05-07 10:30:00',
'2020-05-07 10:30:00', '2020-05-07 10:30:00', '2020-05-07 10:30:00', '2020-05-07 10:30:00')
)
df$A <- as.POSIXct(df$A)
df$B <- as.POSIXct(df$B)
> df
ID A B
1 555 2020-04-13 19:10:00 2020-04-15 12:00:00
2 555 2020-04-14 20:00:00 2020-04-15 12:00:00
3 555 2020-04-13 19:10:00 2020-04-24 11:00:00
4 555 2020-04-14 20:00:00 2020-04-24 11:00:00
5 555 2020-04-22 08:13:00 2020-04-24 11:00:00
6 555 2020-04-23 19:00:00 2020-04-24 11:00:00
7 555 2020-04-13 19:10:00 2020-05-07 10:30:00
8 555 2020-04-14 20:00:00 2020-05-07 10:30:00
9 555 2020-04-22 08:13:00 2020-05-07 10:30:00
10 555 2020-04-23 19:00:00 2020-05-07 10:30:00
11 555 2020-04-27 13:41:10 2020-05-07 10:30:00
12 555 2020-05-01 22:01:00 2020-05-07 10:30:00
我想将B列的每个不同值与对应的A的最小值过滤到一行,并且要求A的值大于前面的B列的值(如果有的话)。因此,期望的输出将减少为:
> df
ID A B
1 555 2020-04-13 19:10:00 2020-04-15 12:00:00
2 555 2020-04-22 08:13:00 2020-04-24 11:00:00
3 555 2020-04-27 13:41:10 2020-05-07 10:30:00
在这种情况下,将有更多的ID,因此ID和B将是分组变量。对于dplyr,这是可能的吗?
您可以试试下面的代码-
library(dplyr)
df %>%
group_by(ID) %>%
mutate(previous_B = lag(B, default = as.POSIXct(0, origin = '1970-01-01'))) %>%
group_by(B, .add = TRUE) %>%
filter(A > first(previous_B)) %>%
summarise(A = min(A)) %>%
ungroup %>%
select(ID, A, B)
# ID A B
# <dbl> <dttm> <dttm>
#1 555 2020-04-13 19:10:00 2020-04-15 12:00:00
#2 555 2020-04-22 08:13:00 2020-04-24 11:00:00
#3 555 2020-04-27 13:41:10 2020-05-07 10:30:00
过滤数据中A
大于前一组B
最后值的行。之后,只需保持各组A
值的最小值即可。
我想我找到你的解决方案了:
my_df <- data.frame(ID = c(555, 555, 555, 555, 555, 555, 555, 555, 555, 555, 555, 555),
A = c('2020-04-13 19:10:00', '2020-04-14 20:00:00', '2020-04-13 19:10:00', '2020-04-14 20:00:00',
'2020-04-22 08:13:00', '2020-04-23 19:00:00', '2020-04-13 19:10:00', '2020-04-14 20:00:00',
'2020-04-22 08:13:00', '2020-04-23 19:00:00', '2020-04-27 13:41:10', '2020-05-01 22:01:00'),
B = c('2020-04-15 12:00:00', '2020-04-15 12:00:00', '2020-04-24 11:00:00', '2020-04-24 11:00:00',
'2020-04-24 11:00:00', '2020-04-24 11:00:00', '2020-05-07 10:30:00', '2020-05-07 10:30:00',
'2020-05-07 10:30:00', '2020-05-07 10:30:00', '2020-05-07 10:30:00', '2020-05-07 10:30:00')
)
my_df$A <- as.POSIXct(my_df$A)
my_df$B <- as.POSIXct(my_df$B)
my_df$Time <- str_split(string = my_df$A, pattern = " ", n = 2, simplify = TRUE)[, 2]
my_df$Time2 <- str_split(string = my_df$B, pattern = " ", n = 2, simplify = TRUE)[, 2]
my_result <- my_df %>% group_by(ID, B) %>% filter(Time > Time2) %>% filter(Time == min(Time))
我让你尝试你所有的数据,让我知道它是否给你预期的结果