R 数据表条件选择



我有一个数据表。

set.seed(1)
DT <- data.table(tag = rep(LETTERS[1:4],each = 2, times = 3),
                 year = rep(1:3, each = 8), month = rep(c(1:2), 12),
                 value = runif(24, 1, 10))
DT
    tag year month    value
 1:   A    1     1 3.389578
 2:   A    1     2 4.349115
 3:   B    1     1 6.155680
 4:   B    1     2 9.173870
 5:   C    1     1 2.815137
 6:   C    1     2 9.085507
 7:   D    1     1 9.502077
 8:   D    1     2 6.947180
 9:   A    2     1 6.662026
10:   A    2     2 1.556076
11:   B    2     1 2.853771
12:   B    2     2 2.589011
13:   C    2     1 7.183206
14:   C    2     2 4.456933
15:   D    2     1 7.928573
16:   D    2     2 5.479293
17:   A    3     1 7.458567
18:   A    3     2 9.927155
19:   B    3     1 4.420317
20:   B    3     2 7.997007
21:   C    3     1 9.412347
22:   C    3     2 2.909283
23:   D    3     1 6.865064
24:   D    3     2 2.129996

yearmonth-valueDT进行排序:

setorder(DT, year, month, -value)

将产生:

    tag year month    value
 1:   D    1     1 9.502077
 2:   B    1     1 6.155680
 3:   A    1     1 3.389578
 4:   C    1     1 2.815137
 5:   B    1     2 9.173870
 6:   C    1     2 9.085507
 7:   D    1     2 6.947180
 8:   A    1     2 4.349115
 9:   D    2     1 7.928573
10:   C    2     1 7.183206
11:   A    2     1 6.662026
12:   B    2     1 2.853771
13:   D    2     2 5.479293
14:   C    2     2 4.456933
15:   B    2     2 2.589011
16:   A    2     2 1.556076
17:   C    3     1 9.412347
18:   A    3     1 7.458567
19:   D    3     1 6.865064
20:   B    3     1 4.420317
21:   A    3     2 9.927155
22:   B    3     2 7.997007
23:   C    3     2 2.909283
24:   D    3     2 2.129996

我希望结果如下:

    tag year month    value
 1:   D    1     1 9.502077
 2:   B    1     1 6.155680
 3:   B    1     2 9.173870
 4:   D    1     2 6.947180
 5:   D    2     1 7.928573
 6:   C    2     1 7.183206
 7:   D    2     2 5.479293
 8:   C    2     2 4.456933
 9:   C    3     1 9.412347
10:   A    3     1 7.458567
11:   A    3     2 9.927155
12:   C    3     2 2.909283

结果DT将具有以下属性:在每年内仅保留在month 1中具有较大价值的两个tags。例如year 1,值较大的两个tagsDB,所以整个year 1保持DByear 2保持DC。在每个month 1,我需要重新选择带有年份的行。

我们按"year"分组,获取"month"1的"tag"中的前两个元素(因为它已经排序(,使用%in%创建一个逻辑索引并子集行。

DT[, .SD[tag %in% head(tag[month ==1],2)], .(year)]
#    year tag month    value
# 1:    1   D     1 9.502077
# 2:    1   B     1 6.155680
# 3:    1   B     2 9.173870
# 4:    1   D     2 6.947180
# 5:    2   D     1 7.928573
# 6:    2   C     1 7.183206
# 7:    2   D     2 5.479293
# 8:    2   C     2 4.456933
# 9:    3   C     1 9.412347
#10:    3   A     1 7.458567
#11:    3   A     2 9.927155
#12:    3   C     2 2.909283

最新更新