让我们有一个列表lis

chicago = data.frame('city' = rep('chicago'), 'year' = c(2018,2019,2020), 'population' = c(100, 105, 110))
paris = data.frame('city' = rep('paris'), 'year' = c(2018,2019,2020), 'population' = c(200, 205, 210))
berlin = data.frame('city' = rep('berlin'), 'year' = c(2018,2019,2020), 'population' = c(300, 305, 310))
bangalore = data.frame('city' = rep('bangalore'), 'year' = c(2018,2019,2020), 'population' = c(400, 405, 410))
lis = list(chicago = chicago, paris = paris, berlin = berlin, bangalore = bangalore)

现在我有一个新的df包含每个city的最新数据，

df = data.frame('city' = c('chicago', 'paris', 'berlin', 'bangalore'), 'year' = rep(2021), 'population' = c(115, 215, 315, 415))

我想在city的基础上将df的每一行添加到lis。

我这样做，

#convert to datframe
lis = dplyr::bind_rows(lis)
#rbind
lis = rbind(lis, df)
#again convert to list
lis = split(lis, lis$city)

对于大型数据集来说效率很低。对于大型数据集，它们是否有任何有效的替代方案?

?谢谢。

编辑

我的原始列表包含2239数据帧，每个数据帧的维度是310x15。

估计执行时间，

最佳表现:

library(data.table)
rbindlist(c(lis, list(df)))[, .(split(.SD, city))]$V1
Unit: milliseconds
expr      min     lq    mean   median       uq      max neval
av() 823.2123 850.56 933.109 865.7741 921.9321 1268.007   100

下,

lis = dplyr::bind_rows(lis)
#rbind
lis = rbind(lis, df)
#again convert to list
lis = split(lis, lis$city)
Unit: seconds
expr      min       lq     mean   median       uq      max neval
ac() 1.893728 2.032478 2.323619 2.285914 2.325451 4.304177   100

Map(rbind, lis, split(df, df$city)[names(lis)])
Unit: seconds
expr     min       lq     mean   median       uq      max neval
az() 2.29919 2.444761 2.749236 2.688349 2.887123 4.205997   100

imap(lis, ~ .x %>%
bind_rows(df %>%
filter(city == .y)))
Unit: seconds
expr    min       lq     mean   median       uq      max neval
ax() 4.9921 5.072752 5.178707 5.121748 5.183845 6.069612   100

我们可以使用imap来遍历list,filter根据list的名称来添加list元素的行

library(dplyr)
library(purrr)
lis2 <- imap(lis, ~ .x %>%
bind_rows(df %>%
filter(city == .y)))

与产出

> lis2
$chicago
city year population
1 chicago 2018        100
2 chicago 2019        105
3 chicago 2020        110
4 chicago 2021        115
$paris
city year population
1 paris 2018        200
2 paris 2019        205
3 paris 2020        210
4 paris 2021        215
$berlin
city year population
1 berlin 2018        300
2 berlin 2019        305
3 berlin 2020        310
4 berlin 2021        315
$bangalore
city year population
1 bangalore 2018        400
2 bangalore 2019        405
3 bangalore 2020        410
4 bangalore 2021        415

或将base R与Map和rbind一起使用

Map(function(x, nm) rbind(x, df[df$city == nm,]), lis, names(lis))

或者从data.table使用rbindlist

library(data.table)
rbindlist(c(lis, list(df)))[, .(split(.SD, city))]$V1

或者稍微有效一点，将是split

Map(rbind, lis, split(df, df$city)[names(lis)])

r语言 - 将行添加到来自另一个数据框的数据框列表中

编辑

相关内容

最新更新

热门标签：