r语言 - 将函数泛化为向量格式的data.table



我有以下数据结构,我想在其中逐行插入数据,直到某一年:

require('data.table')
test_dt <- data.table(iso1 = c('BTN', 'IND', 'BGD'),
iso2 = c('AFG', 'AFG', 'AFG'),
year = c(2006, 2003, 2006))

我想出了下面的函数,它在单行情况下工作得很好,但不适用于一般情况:

interpolate_rows <- function(dt, stop_year = 2008)  {

year <- as.integer(dt[, .SD, .SDcols = 'year'])

# If year is less than stop year, fill in observations:
if (year < stop_year) {
time_delta <- seq(year, stop_year)

# Explode bilateral country observation:
dt <- dt[rep(dt[, .I], length(time_delta))]

# Replace year column w/ time_delta sequence:
dt <- dt[, year := time_delta]
}

return(dt)
}
## Output
bar <- interpolate_rows(test_dt[1])
bar
iso1  iso2   year
1:  BTN    AFG    2006
2:  BTN    AFG    2007
3:  BTN    AFG    2008

我想要的是:

bar <- interpolate_rows(test_dt)
bar
iso1  iso2   year
1:  BTN    AFG    2006
2:  BTN    AFG    2007
3:  BTN    AFG    2008
6:  IND    AFG    2003
7:  IND    AFG    2004
8:  IND    AFG    2005
9:  IND    AFG    2006
10:  IND    AFG    2007
11:  IND    AFG    2008
14:  BGD    AFG    2006
14:  BGD    AFG    2007
14:  BGD    AFG    2008

我知道罪魁祸首很可能是这一行year <- as.integer(dt[, .SD, .SDcols = 'year']),但是我不知道如何把它替换成功向量解。我试图在interpolate_rows()中嵌套lapply()函数以提取每个独特组的年份,并使用Map()进行测试,但这些都没有产生工作解决方案。

任何帮助指向我可行的矢量解决方案,将不胜感激。

直接使用by:

test_dt[, .(year = min(year):stop_year), by = .(iso1, iso2)]
#     iso1 iso2 year
#  1:  BTN  AFG 2006
#  2:  BTN  AFG 2007
#  3:  BTN  AFG 2008
#  4:  IND  AFG 2003
#  5:  IND  AFG 2004
#  6:  IND  AFG 2005
#  7:  IND  AFG 2006
#  8:  IND  AFG 2007
#  9:  IND  AFG 2008
# 10:  BGD  AFG 2006
# 11:  BGD  AFG 2007
# 12:  BGD  AFG 2008

使用dplyrtidyr库的一种方式

library(dplyr)
library(tidyr)
interpolate_rows <- function(dt, stop_year = 2008)  {
dt %>%
group_by(iso1, iso2) %>%
complete(year = year : stop_year) %>%
ungroup
}
interpolate_rows(test_dt)
#  iso1  iso2   year
#   <chr> <chr> <dbl>
# 1 BGD   AFG    2006
# 2 BGD   AFG    2007
# 3 BGD   AFG    2008
# 4 BTN   AFG    2006
# 5 BTN   AFG    2007
# 6 BTN   AFG    2008
# 7 IND   AFG    2003
# 8 IND   AFG    2004
# 9 IND   AFG    2005
#10 IND   AFG    2006
#11 IND   AFG    2007
#12 IND   AFG    2008

另一种方式-

library(data.table)
interpolate_rows <- function(dt, stop_year = 2008)  {
vals <- seq(dt$year, stop_year)
dt[rep(1, length(vals))][, year := vals]
}
rbindlist(by(test_dt, seq(nrow(test_dt)), interpolate_rows))

最新更新