r-使用dplyr和data.table在组内分配行号



我需要创建一个新列,该列包含组中的行号。

要使用的一些数据:

> set.seed(222)
> dt <- diamonds %>% 
select(cut, color, price) %>% 
rename(riding=cut,party=color,votes=price) %>% 
group_by(riding) %>% sample_n(3) %>% 
distinct(riding,party,.keep_all = TRUE) %>%
arrange(riding, desc(votes) ) %>% data.table %T>% print 
riding party votes
<ord> <ord> <int>
1:      Fair     H  3658
2:      Fair     G  2808
3:      Good     E  2542
4:      Good     D   684
5: Very Good     G  7974
6: Very Good     F  1637
7: Very Good     D   447
8:   Premium     H  5458
9:   Premium     F  2469
10:   Premium     D  1892
11:     Ideal     F 10786
12:     Ideal     E  4832
13:     Ideal     G   757

所以想要的输出应该是这样的:

riding party votes place
<ord> <ord> <int>  <int>
1:      Fair     H  3658   1
2:      Fair     G  2808   2
3:      Good     E  2542   1
4:      Good     D   684   2
5: Very Good     G  7974   1
6: Very Good     F  1637   2
7: Very Good     D   447   3
8:   Premium     H  5458   1
9:   Premium     F  2469   2
10:   Premium     D  1892  3
11:     Ideal     F 10786  1
12:     Ideal     E  4832  2
13:     Ideal     G   757  3

请告诉我如何使用dplyrdata.table,或者两者都使用。

我原以为下面的会奏效,但没有。有人知道为什么吗?它给出全局行n。我可以将.I与by一起使用吗?

> dt2[ order(votes), place:=.I, by=riding][]     # does not work
riding party votes place
<ord> <ord> <int> <int>
1:      Fair     H  3658     1
2:      Fair     G  2808     2
3:      Good     E  2542     3
4:      Good     D   684     4
5: Very Good     G  7974     5
6: Very Good     F  1637     6
7: Very Good     D   447     7
8:   Premium     H  5458     8
9:   Premium     F  2469     9
10:   Premium     D  1892    10
11:     Ideal     F 10786    11
12:     Ideal     E  4832    12
13:     Ideal     G   757    13

dplyr()中,我建议将group_by()riding一起使用,然后创建新变量,其序列从1到n():

library(dplyr)
library(data.table)
#Code
set.seed(222)
dt <- diamonds %>% 
select(cut, color, price) %>% 
rename(riding=cut,party=color,votes=price) %>% 
group_by(riding) %>% sample_n(3) %>% 
distinct(riding,party,.keep_all = TRUE) %>%
arrange(riding, desc(votes) ) %>% data.table %>% print 
#Create id
dt %>% group_by(riding) %>% mutate(place=1:n())

输出:

# A tibble: 13 x 4
# Groups:   riding [5]
riding    party votes place
<ord>     <ord> <int> <int>
1 Fair      H      3658     1
2 Fair      G      2808     2
3 Good      E      2542     1
4 Good      D       684     2
5 Very Good G      7974     1
6 Very Good F      1637     2
7 Very Good D       447     3
8 Premium   H      5458     1
9 Premium   F      2469     2
10 Premium   D      1892     3
11 Ideal     F     10786     1
12 Ideal     E      4832     2
13 Ideal     G       757     3

类似于data.table.I不单独计算每组,因此您必须使用1:.N或等效的

dt2[ order(votes), place := 1:.N, by=riding][]

我们也可以使用setorder

library(data.table)
setorder(dt, votes)[, place := seq_len(.N), by = riding][]

最新更新