我需要创建一个新列,该列包含组中的行号。
要使用的一些数据:
> set.seed(222)
> dt <- diamonds %>%
select(cut, color, price) %>%
rename(riding=cut,party=color,votes=price) %>%
group_by(riding) %>% sample_n(3) %>%
distinct(riding,party,.keep_all = TRUE) %>%
arrange(riding, desc(votes) ) %>% data.table %T>% print
riding party votes
<ord> <ord> <int>
1: Fair H 3658
2: Fair G 2808
3: Good E 2542
4: Good D 684
5: Very Good G 7974
6: Very Good F 1637
7: Very Good D 447
8: Premium H 5458
9: Premium F 2469
10: Premium D 1892
11: Ideal F 10786
12: Ideal E 4832
13: Ideal G 757
所以想要的输出应该是这样的:
riding party votes place
<ord> <ord> <int> <int>
1: Fair H 3658 1
2: Fair G 2808 2
3: Good E 2542 1
4: Good D 684 2
5: Very Good G 7974 1
6: Very Good F 1637 2
7: Very Good D 447 3
8: Premium H 5458 1
9: Premium F 2469 2
10: Premium D 1892 3
11: Ideal F 10786 1
12: Ideal E 4832 2
13: Ideal G 757 3
请告诉我如何使用dplyr
或data.table
,或者两者都使用。
我原以为下面的会奏效,但没有。有人知道为什么吗?它给出全局行n。我可以将.I与by
一起使用吗?
> dt2[ order(votes), place:=.I, by=riding][] # does not work
riding party votes place
<ord> <ord> <int> <int>
1: Fair H 3658 1
2: Fair G 2808 2
3: Good E 2542 3
4: Good D 684 4
5: Very Good G 7974 5
6: Very Good F 1637 6
7: Very Good D 447 7
8: Premium H 5458 8
9: Premium F 2469 9
10: Premium D 1892 10
11: Ideal F 10786 11
12: Ideal E 4832 12
13: Ideal G 757 13
在dplyr()
中,我建议将group_by()
与riding
一起使用,然后创建新变量,其序列从1到n()
:
library(dplyr)
library(data.table)
#Code
set.seed(222)
dt <- diamonds %>%
select(cut, color, price) %>%
rename(riding=cut,party=color,votes=price) %>%
group_by(riding) %>% sample_n(3) %>%
distinct(riding,party,.keep_all = TRUE) %>%
arrange(riding, desc(votes) ) %>% data.table %>% print
#Create id
dt %>% group_by(riding) %>% mutate(place=1:n())
输出:
# A tibble: 13 x 4
# Groups: riding [5]
riding party votes place
<ord> <ord> <int> <int>
1 Fair H 3658 1
2 Fair G 2808 2
3 Good E 2542 1
4 Good D 684 2
5 Very Good G 7974 1
6 Very Good F 1637 2
7 Very Good D 447 3
8 Premium H 5458 1
9 Premium F 2469 2
10 Premium D 1892 3
11 Ideal F 10786 1
12 Ideal E 4832 2
13 Ideal G 757 3
类似于data.table
。.I
不单独计算每组,因此您必须使用1:.N
或等效的
dt2[ order(votes), place := 1:.N, by=riding][]
我们也可以使用setorder
library(data.table)
setorder(dt, votes)[, place := seq_len(.N), by = riding][]