我想通过这样做来子集我的data.table:通过分组id
,group
在满足条件时将第一行移到该行。这意味着如果第 3 行满足条件,我想保留第 1、2 和 3 行。
数据示例:
id time group
1: 1 0 1
2: 1 20 1
3: 1 0 2
4: 1 40 2
5: 2 0 1
6: 2 35 1
7: 2 50 1
8: 3 0 1
9: 3 10 1
10: 3 20 1
11: 3 0 2
12: 3 25 2
13: 3 45 2
条件是 :time > 30
,因此预期结果将是:
id time group
1: 1 0 2
2: 1 40 2
3: 2 0 1
4: 2 35 1
5: 3 0 2
6: 3 25 2
7: 3 45 2
我试过:df[1:which(time >30)[1], .SD, by = .(id, group)]
但它返回:
id group time
1: 1 1 0
2: 1 1 20
3: 1 2 0
4: 1 2 40
数据:
structure(list(id = c(1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3),
time = c(0, 20, 0, 40, 0, 35, 50, 0, 10, 20, 0, 25, 45),
group = c(1, 1, 2, 2, 1, 1, 1, 1, 1, 1, 2, 2, 2)), .Names = c("id",
"time", "group"), row.names = c(NA, -13L), class = c("data.table",
"data.frame"))
更新显示 akrun 用另一个数据集的答案的预期行为:
数据:
> dftest
patientid groupe arret dateConsult lag_dateConsult temps abst temps_cum
1: 0303H233457 2 1 2011-10-05 <NA> 0 1 0
2: 0303H233457 2 1 2011-11-09 2011-10-05 35 1 35
3: 0303H233457 2 1 2011-12-21 2011-11-09 42 1 77
4: 0303H233457 2 1 2012-01-30 2011-12-21 40 1 117
5: 0303H233457 2 1 2012-04-18 2012-01-30 79 1 196
6: 0303H233457 2 1 2012-08-27 2012-04-18 131 1 327
7: 0303H233457 4 1 2012-11-19 <NA> 0 1 0
8: 0303H233457 4 1 2013-01-07 2012-11-19 49 1 49
我得到什么 :
> dftest[dftest[, .I[seq(which(temps_cum > 30))], .(patientid, groupe)]$V1]
patientid groupe arret dateConsult lag_dateConsult temps abst temps_cum
1: 0303H233457 2 1 2011-10-05 <NA> 0 1 0
2: 0303H233457 2 1 2011-11-09 2011-10-05 35 1 35
3: 0303H233457 2 1 2011-12-21 2011-11-09 42 1 77
4: 0303H233457 2 1 2012-01-30 2011-12-21 40 1 117
5: 0303H233457 2 1 2012-04-18 2012-01-30 79 1 196
6: 0303H233457 4 1 2012-11-19 <NA> 0 1 0
7: 0303H233457 4 1 2013-01-07 2012-11-19 49 1 49
预期结果 :
patientid groupe arret dateConsult lag_dateConsult temps abst temps_cum
1: 0303H233457 2 1 2011-10-05 <NA> 0 1 0
2: 0303H233457 2 1 2011-11-09 2011-10-05 35 1 35
3: 0303H233457 4 1 2012-11-19 <NA> 0 1 0
4: 0303H233457 4 1 2013-01-07 2012-11-19 49 1 49
数据:
structure(list(patientid = c("0303H233457", "0303H233457", "0303H233457",
"0303H233457", "0303H233457", "0303H233457", "0303H233457", "0303H233457"
), groupe = c(2, 2, 2, 2, 2, 2, 4, 4), arret = c(1, 1, 1, 1,
1, 1, 1, 1), dateConsult = structure(c(15252, 15287, 15329, 15369,
15448, 15579, 15663, 15712), class = "Date"), lag_dateConsult = structure(c(NA,
15252, 15287, 15329, 15369, 15448, NA, 15663), class = "Date"),
temps = c(0, 35, 42, 40, 79, 131, 0, 49), abst = c(1, 1,
1, 1, 1, 1, 1, 1), temps_cum = c(0, 35, 77, 117, 196, 327,
0, 49)), .Names = c("patientid", "groupe", "arret", "dateConsult",
"lag_dateConsult", "temps", "abst", "temps_cum"), class = c("data.table",
"data.frame"), row.names = c(NA, -8L))
按"id"、"group"分组后,获取"time"大于 30 的行索引,并对行进行子集化
df1[df1[, .I[seq(which(time > 30))], .(id, group)]$V1]
如果我们还需要直到"时间"大于 30 的最后一行
df1[df1[, .I[seq(tail(which(time > 30), 1))], .(id, group)]$V1]
# id time group
#1: 1 0 2
#2: 1 40 2
#3: 2 0 1
#4: 2 35 1
#5: 2 50 1
#6: 3 0 2
#7: 3 25 2
#8: 3 45 2