r语言 - 子集，直到组 data.table 满足条件(包括它) - r - Subset until the condition is met (including it) by group data.table 小贝子编程网

我想通过这样做来子集我的data.table：通过分组id，group在满足条件时将第一行移到该行。这意味着如果第 3 行满足条件，我想保留第 1、2 和 3 行。

数据示例：

id time group
1:  1    0     1
2:  1   20     1
3:  1    0     2
4:  1   40     2
5:  2    0     1
6:  2   35     1
7:  2   50     1
8:  3    0     1
9:  3   10     1
10:  3   20     1
11:  3    0     2
12:  3   25     2
13:  3   45     2

条件是：time > 30，因此预期结果将是：

id time group
1:  1    0     2
2:  1   40     2
3:  2    0     1
4:  2   35     1
5:  3    0     2
6:  3   25     2
7:  3   45     2

我试过：df[1:which(time >30)[1], .SD, by = .(id, group)]

但它返回：

id group time
1:  1     1    0
2:  1     1   20
3:  1     2    0
4:  1     2   40

数据：

structure(list(id = c(1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3), 
time = c(0, 20, 0, 40, 0, 35, 50, 0, 10, 20, 0, 25, 45), 
group = c(1, 1, 2, 2, 1, 1, 1, 1, 1, 1, 2, 2, 2)), .Names = c("id", 
       "time", "group"), row.names = c(NA, -13L), class = c("data.table", 
                                                            "data.frame"))

更新显示 akrun 用另一个数据集的答案的预期行为：

数据：

> dftest
patientid groupe arret dateConsult lag_dateConsult temps abst temps_cum
1: 0303H233457      2     1  2011-10-05            <NA>     0    1         0
2: 0303H233457      2     1  2011-11-09      2011-10-05    35    1        35
3: 0303H233457      2     1  2011-12-21      2011-11-09    42    1        77
4: 0303H233457      2     1  2012-01-30      2011-12-21    40    1       117
5: 0303H233457      2     1  2012-04-18      2012-01-30    79    1       196
6: 0303H233457      2     1  2012-08-27      2012-04-18   131    1       327
7: 0303H233457      4     1  2012-11-19            <NA>     0    1         0
8: 0303H233457      4     1  2013-01-07      2012-11-19    49    1        49

我得到什么：

> dftest[dftest[, .I[seq(which(temps_cum > 30))], .(patientid, groupe)]$V1]
patientid groupe arret dateConsult lag_dateConsult temps abst temps_cum
1: 0303H233457      2     1  2011-10-05            <NA>     0    1         0
2: 0303H233457      2     1  2011-11-09      2011-10-05    35    1        35
3: 0303H233457      2     1  2011-12-21      2011-11-09    42    1        77
4: 0303H233457      2     1  2012-01-30      2011-12-21    40    1       117
5: 0303H233457      2     1  2012-04-18      2012-01-30    79    1       196
6: 0303H233457      4     1  2012-11-19            <NA>     0    1         0
7: 0303H233457      4     1  2013-01-07      2012-11-19    49    1        49

预期结果：

patientid groupe arret dateConsult lag_dateConsult temps abst temps_cum
1: 0303H233457      2     1  2011-10-05            <NA>     0    1         0
2: 0303H233457      2     1  2011-11-09      2011-10-05    35    1        35
3: 0303H233457      4     1  2012-11-19            <NA>     0    1         0
4: 0303H233457      4     1  2013-01-07      2012-11-19    49    1        49

数据：

structure(list(patientid = c("0303H233457", "0303H233457", "0303H233457", 
"0303H233457", "0303H233457", "0303H233457", "0303H233457", "0303H233457"
), groupe = c(2, 2, 2, 2, 2, 2, 4, 4), arret = c(1, 1, 1, 1, 
1, 1, 1, 1), dateConsult = structure(c(15252, 15287, 15329, 15369, 
                  15448, 15579, 15663, 15712), class = "Date"), lag_dateConsult = structure(c(NA, 
                                                                                              15252, 15287, 15329, 15369, 15448, NA, 15663), class = "Date"), 
temps = c(0, 35, 42, 40, 79, 131, 0, 49), abst = c(1, 1, 
1, 1, 1, 1, 1, 1), temps_cum = c(0, 35, 77, 117, 196, 327, 
              0, 49)), .Names = c("patientid", "groupe", "arret", "dateConsult", 
                                  "lag_dateConsult", "temps", "abst", "temps_cum"), class = c("data.table", 
                                                                                              "data.frame"), row.names = c(NA, -8L))

按"id"、"group"分组后，获取"time"大于 30 的行索引，并对行进行子集化

df1[df1[, .I[seq(which(time > 30))], .(id, group)]$V1]

如果我们还需要直到"时间"大于 30 的最后一行

df1[df1[, .I[seq(tail(which(time > 30), 1))], .(id, group)]$V1]
#   id time group
#1:  1    0     2
#2:  1   40     2
#3:  2    0     1
#4:  2   35     1
#5:  2   50     1
#6:  3    0     2
#7:  3   25     2
#8:  3   45     2

r语言 - 子集，直到组 data.table 满足条件(包括它)

相关内容

最新更新

热门标签：