r-使用dplyr和data.table在组内分配行号

我需要创建一个新列，该列包含组中的行号。

要使用的一些数据：

> set.seed(222)
> dt <- diamonds %>% 
select(cut, color, price) %>% 
rename(riding=cut,party=color,votes=price) %>% 
group_by(riding) %>% sample_n(3) %>% 
distinct(riding,party,.keep_all = TRUE) %>%
arrange(riding, desc(votes) ) %>% data.table %T>% print 
riding party votes
<ord> <ord> <int>
1:      Fair     H  3658
2:      Fair     G  2808
3:      Good     E  2542
4:      Good     D   684
5: Very Good     G  7974
6: Very Good     F  1637
7: Very Good     D   447
8:   Premium     H  5458
9:   Premium     F  2469
10:   Premium     D  1892
11:     Ideal     F 10786
12:     Ideal     E  4832
13:     Ideal     G   757

所以想要的输出应该是这样的：

riding party votes place
<ord> <ord> <int>  <int>
1:      Fair     H  3658   1
2:      Fair     G  2808   2
3:      Good     E  2542   1
4:      Good     D   684   2
5: Very Good     G  7974   1
6: Very Good     F  1637   2
7: Very Good     D   447   3
8:   Premium     H  5458   1
9:   Premium     F  2469   2
10:   Premium     D  1892  3
11:     Ideal     F 10786  1
12:     Ideal     E  4832  2
13:     Ideal     G   757  3

请告诉我如何使用dplyr或data.table，或者两者都使用。

我原以为下面的会奏效，但没有。有人知道为什么吗？它给出全局行n。我可以将.I与by一起使用吗？

> dt2[ order(votes), place:=.I, by=riding][]     # does not work
riding party votes place
<ord> <ord> <int> <int>
1:      Fair     H  3658     1
2:      Fair     G  2808     2
3:      Good     E  2542     3
4:      Good     D   684     4
5: Very Good     G  7974     5
6: Very Good     F  1637     6
7: Very Good     D   447     7
8:   Premium     H  5458     8
9:   Premium     F  2469     9
10:   Premium     D  1892    10
11:     Ideal     F 10786    11
12:     Ideal     E  4832    12
13:     Ideal     G   757    13

在dplyr()中，我建议将group_by()与riding一起使用，然后创建新变量，其序列从1到n():

library(dplyr)
library(data.table)
#Code
set.seed(222)
dt <- diamonds %>% 
select(cut, color, price) %>% 
rename(riding=cut,party=color,votes=price) %>% 
group_by(riding) %>% sample_n(3) %>% 
distinct(riding,party,.keep_all = TRUE) %>%
arrange(riding, desc(votes) ) %>% data.table %>% print 
#Create id
dt %>% group_by(riding) %>% mutate(place=1:n())

输出：

# A tibble: 13 x 4
# Groups:   riding [5]
riding    party votes place
<ord>     <ord> <int> <int>
1 Fair      H      3658     1
2 Fair      G      2808     2
3 Good      E      2542     1
4 Good      D       684     2
5 Very Good G      7974     1
6 Very Good F      1637     2
7 Very Good D       447     3
8 Premium   H      5458     1
9 Premium   F      2469     2
10 Premium   D      1892     3
11 Ideal     F     10786     1
12 Ideal     E      4832     2
13 Ideal     G       757     3

类似于data.table。.I不单独计算每组，因此您必须使用1:.N或等效的

dt2[ order(votes), place := 1:.N, by=riding][]

我们也可以使用setorder

library(data.table)
setorder(dt, votes)[, place := seq_len(.N), by = riding][]

相关内容

最新更新

热门标签：