r-根据时间限制删除行

我有以下数据集：

>dput(df)
structure(list(Author = c("hitham", "Ow", "WPJ4", "Seb", "Karen", "Ow", "Ow", "hitham", "Sarah",
"Rene"), diff = structure(c(28, 2, 8, 3, 7, 8, 11, 1, 4, 8), class = "difftime", units = "secs")), 
row.names = 1:10, class = "data.frame")

正如我们所看到的，作者Ow出现了三次，而作者hitham出现了两次：

Author    diff
1  hitham 28 secs
2      Ow  2 secs
3    WPJ4  8 secs
4     Seb  3 secs
5   Karen  7 secs
6      Ow  8 secs
7      Ow 11 secs
8  hitham  1 secs
9   Sarah  4 secs
10   Rene  8 secs

这些行表示作者执行的一些活动。例如，hitham在1秒后执行活动，然后在第二次执行18秒后执行活动。

我想确保一项活动和另一项活动之间至少有10秒的时间。

我想删除那些不符合这一要求的活动(行(。例如，Ow在2秒后执行其活动，然后在8秒后执行：后者应该被删除。所需的结果是：

Author    diff
1  hitham 28 secs
2      Ow  2 secs
3    WPJ4  8 secs
4     Seb  3 secs
5   Karen  7 secs
6      Ow 11 secs
7  hitham  1 secs
8   Sarah  4 secs
9    Rene  8 secs

编辑。我希望能更清楚地补充这一点。让我们考虑hitham。如果我们考虑hitham行(按diff字段排序(：

hitham  1 secs
hitham 28 secs

我们有(28-1)+1>10，那么就没有必要删除它们中的任何一个。

现在让我们考虑Ow。

Ow  2 secs
Ow  8 secs
Ow 11 secs

连续行之间以秒为单位的差异为(见最后一列(：

Ow  2 secs  -
Ow  8 secs  7
Ow 11 secs  4

删除最后一列中显示小于10的数字的第一行可以获得所需的结果。事实上：

Ow  2 secs  -
Ow 11 secs  10

我们不必删除最后一行，因为这里的差异只有10。

基于这个答案，您可以尝试递归方法。

library(dplyr)
my_fun <- function(d, ind = 1) {
ind.next <- first(which(d - d[ind] >= 9))
if (length(ind.next) == 0)
return(ind)
else
return(c(ind, my_fun(d, ind.next)))
}
df %>%
group_by(Author) %>%
arrange(diff) %>%
slice(my_fun(diff))

在函数中，每次通过，它都标识下一个索引ind.next，该索引是diff从ind索引的diff开始大于或等于9秒的第一个索引。如果没有可用的ind.next，则返回ind。否则，再次递归调用该函数并与ind连接。

输出

Author diff   
<chr>  <drtn> 
1 hitham  1 secs
2 hitham 28 secs
3 Karen   7 secs
4 Ow      2 secs
5 Ow     11 secs
6 Rene    8 secs
7 Sarah   4 secs
8 Seb     3 secs
9 WPJ4    8 secs

相关内容

最新更新

热门标签：