枚举R中的实例并添加一个新列



我有一个表:

dt <- data.table(instance = c("A","A","A","B","B","B", "C","C","C","C","C","A","A",
"B","B","B", "C","C","C","C","C"), 
date = c("2019-02-25","2019-02-25","2019-02-25","2019-02-25","2019-02-25",
"2019-02-25", "2019-02-25","2019-02-25","2019-02-25","2019-02-25",
"2019-02-25","2019-03-01","2019-03-01","2019-03-01","2019-03-01",
"2019-03-01", "2019-03-01","2019-03-01","2019-03-01","2019-03-01","2019-03-01"), 
y = c("0,1","0,2","0,2","0,1","0,1","0,15","0,1","0,2","0,3","0,1","0,1",
"0,1","0,1","0,1","0,25","0,3","0,1","0,1","0,15","0,1","0,2")
dt

我需要添加一列"N"其中实例将按照从1到最大货币实例数的顺序排序(这里最大数量是5(货币RON的行数))。所有类型的货币都应该从1到这个最大值。如果某些货币的变量数量较少,则应该在列值"n"的地方添加行。会想念Na的

所以,我需要一个代码之后,我可以得到以下表格:
| instance | date | y  | N|
:-----|-----| ------|-----|
| A | 2019-02-25 | 0,1 |1|
| A | 2019-02-25 |0,2  |2|
| A | 2019-02-25 |0,2  |3|
| A | 2019-02-25 |Na   |4|
| A | 2019-02-25 |Na   |5|
| B | 2019-02-25 |0,1  |1|
| B | 2019-02-25 |0,1  |2|
| B | 2019-02-25 |0,1  |3|
| B | 2019-02-25 |Na   |4|
| B | 2019-02-25 |Na   |5|
| C | 2019-02-25 |0,1  |1|
| C | 2019-02-25 |0,2  |2|
| C | 2019-02-25 |0,3  |3|
| C | 2019-02-25 |0,1  |4|
| C | 2019-02-25 |0,1  |5|
...

这是tidyr::complete的绝佳机会。

library(dplyr)
library(tidyr)
dat  |>
group_by(currency, date)  |>
mutate(N = row_number())  |>
ungroup()  |>
complete(currency, date, N) |>
arrange(date, currency, N)
# # A tibble: 30 x 4
#    currency date           N y    
#    <chr>    <chr>      <int> <chr>
#  1 EUR      2019-02-25     1 0,1
#  2 EUR      2019-02-25     2 0,2
#  3 EUR      2019-02-25     3 0,2
#  4 EUR      2019-02-25     4 NA
#  5 EUR      2019-02-25     5 NA
#  6 RON      2019-02-25     1 0,1
#  7 RON      2019-02-25     2 0,2
#  8 RON      2019-02-25     3 0,3
#  9 RON      2019-02-25     4 0,1
# 10 RON      2019-02-25     5 0,1
# # ... with 20 more rows

您可以像这样使用baser中提供的rle函数:

instances = rle(dt$currency)
dt$N = unlist(sapply(instances$lengths,function(x) 1:x)) 

RLE表示运行长度编码。该函数返回数据值,并对向量中连续出现的值或每次"运行"的值进行计数。一旦我们有了这个,我们通过instanceslengths元素访问计数。

最新更新