我不太明白这个的语法。我有一个data.table
,我想先按分组列g1
(有序因子)排序,然后按降序按另一列n
排序。唯一的问题是,我希望行标记为"其他"。第三列g2
出现在每一组的底部,而不管它们的n
的值。
的例子:
library(data.table)
dt <- data.table(g1 = factor(rep(c('Australia', 'Mexico', 'Canada'), 3), levels = c('Australia', 'Canada', 'Mexico')),
g2 = rep(c('stuff', 'things', 'other'), each = 3),
n = c(1000, 2000, 3000, 5000, 100, 3500, 10000, 10000, 0))
这是预期的输出,其中在每个g1
中,除了g2 == 'other'
总是在底部的行外,我们将n
降序排列:
g1 g2 n
1: Australia things 5000
2: Australia stuff 1000
3: Australia other 10000
4: Canada things 3500
5: Canada stuff 3000
6: Canada other 0
7: Mexico stuff 2000
8: Mexico things 100
9: Mexico other 10000
利用data.table::order
和它的-
-反向排序:
dt[order(g1, g2 == "other", -n), ]
# g1 g2 n
# <fctr> <char> <num>
# 1: Australia things 5000
# 2: Australia stuff 1000
# 3: Australia other 10000
# 4: Canada things 3500
# 5: Canada stuff 3000
# 6: Canada other 0
# 7: Mexico stuff 2000
# 8: Mexico things 100
# 9: Mexico other 10000
我们添加了g2 == "other"
,因为你说"其他";应该总是最后一个。例如,如果"stuff"
是"abc"
,那么我们可以看到行为的差异:
dt[ g2 == "stuff", g2 := "abc" ]
dt[order(g1, -n), ]
# g1 g2 n
# <fctr> <char> <num>
# 1: Australia other 10000
# 2: Australia things 5000
# 3: Australia abc 1000
# 4: Canada things 3500
# 5: Canada abc 3000
# 6: Canada other 0
# 7: Mexico other 10000
# 8: Mexico abc 2000
# 9: Mexico things 100
dt[order(g1, g2 == "other", -g2), ]
# g1 g2 n
# <fctr> <char> <num>
# 1: Australia things 5000
# 2: Australia abc 1000
# 3: Australia other 10000
# 4: Canada things 3500
# 5: Canada abc 3000
# 6: Canada other 0
# 7: Mexico things 100
# 8: Mexico abc 2000
# 9: Mexico other 10000
这样做的一个缺点是setorder
不能直接工作:
setorder(dt, g1, g2 == "other", -n)
# Error in setorderv(x, cols, order, na.last) :
# some columns are not in the data.table: ==,other
所以我们需要重新排序并重新赋值给dt
。
BTW:这可以工作,因为g2 == "other"
解析为logical
,是的,但在排序时,它们被视为0
(false)和1
(true),因此假条件将出现在真条件之前。