对R中的数据帧列表执行迭代的最有效方法



我修改了虹膜数据,重新创建了一个简单版本的问题:

library(purrr)
library(dplyr)
#Modify the dataset
data(iris)
iris$Sepal.Length.2019<-iris$Sepal.Length
iris$category<-ifelse(iris$Petal.Width<1.3,"small","big")
#Split the dataframe into multiple  dataframes based on Species
list<-split(iris,iris$Species)
start_year<-c(2020)
End_year<-c(2022)
#Function (Failed)
manipulate<-map(x){

for(i in start_year:End_year){
#Step 1:Create a new column with year suffix
x[,paste("Sepal.Length",i,sep=".")]<-x[,paste("Sepal.Length",i-1,sep="."]*2

#step 2 (The problem step):sort each dataframe based on value for a given year and category variable to create a cumsum of the sorted value
x<- x %>% group_by(category) %>% arrange(x[,paste("Sepal.Length",i,sep=".")])
%>% mutate(x[,paste("Sum.Sepal.Length",i,sep=".")]=cumsum( x[,paste("Sepal.Length",i,sep=".")]))

#Step 3: perform more analysis with Sum.Sepal.Length
x[,paste("Sum.Length.Compare",i,sep=".")]<-ifelse(x[,paste("Sum.Sepal.Length",i,sep=".")]>2,"Good","Bad")

return(x)
}
}
#Map this over list
new_list<-map(list,manipulate)

由于步骤2,我出现了一个错误,可能是因为我混合了很多不同的元素。这里还有其他应该使用的包装或配方吗?循环的目标是以迭代的方式在现有列的基础上创建新列。

我对使用purrr和应用家庭真的很陌生。任何帮助都将不胜感激!非常感谢。

这个问题已经三个星期没有人回答了。

我已经尝试使用dplyr来修复manipulate()函数的问题。不幸的是,我对dplyrrlang编程的了解似乎太有限了。

我必须承认,我对data.table语法更流利。因此,我试图使用data.table:找到一个等效的工作解决方案

manipulateDT <- function(x, beg_yr, end_yr) {
setDT(x)
for (i in seq(beg_yr, end_yr, 1)) {
# define shortcuts
pre_yr <- paste0("Sepal.Length.", i - 1L)
cur_yr <- paste0("Sepal.Length.", i)
#Step 1: Create a new column with year suffix
x[, (cur_yr) := get(pre_yr) * 2]
#step 2 (The problem step): sort each dataframe based on value for a given year and category variable to create a cumsum of the sorted value
setorderv(x, cur_yr)
x[, paste0("Sum.Sepal.Length.", i) := cumsum(get(cur_yr)), by = category]
#Step 3: perform more analysis with Sum.Sepal.Length
x[, paste0("Sum.Length.Compare.", i) := fifelse(get(cur_yr) > 2, "Good", "Bad")]
}
return(x)
}

该功能称为

library(data.table)
new_listDT <- lapply(list, manipulateDT, start_year, End_year)

结果太大,无法完整显示。因此,给人一种印象:

str(new_listDT)
List of 3
$ setosa    :Classes ‘data.table’ and 'data.frame':  50 obs. of  16 variables:
..$ Sepal.Length           : num [1:50] 4.3 4.4 4.4 4.4 4.5 4.6 4.6 4.6 4.6 4.7 ...
..$ Sepal.Width            : num [1:50] 3 2.9 3 3.2 2.3 3.1 3.4 3.6 3.2 3.2 ...
..$ Petal.Length           : num [1:50] 1.1 1.4 1.3 1.3 1.3 1.5 1.4 1 1.4 1.3 ...
..$ Petal.Width            : num [1:50] 0.1 0.2 0.2 0.2 0.3 0.2 0.3 0.2 0.2 0.2 ...
..$ Species                : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
..$ Sepal.Length.2019      : num [1:50] 4.3 4.4 4.4 4.4 4.5 4.6 4.6 4.6 4.6 4.7 ...
..$ category               : chr [1:50] "small" "small" "small" "small" ...
..$ Sepal.Length.2020      : num [1:50] 8.6 8.8 8.8 8.8 9 9.2 9.2 9.2 9.2 9.4 ...
..$ Sum.Sepal.Length.2020  : num [1:50] 8.6 17.4 26.2 35 44 53.2 62.4 71.6 80.8 90.2 ...
..$ Sum.Length.Compare.2020: chr [1:50] "Good" "Good" "Good" "Good" ...
..$ Sepal.Length.2021      : num [1:50] 17.2 17.6 17.6 17.6 18 18.4 18.4 18.4 18.4 18.8 ...
..$ Sum.Sepal.Length.2021  : num [1:50] 17.2 34.8 52.4 70 88 ...
..$ Sum.Length.Compare.2021: chr [1:50] "Good" "Good" "Good" "Good" ...
..$ Sepal.Length.2022      : num [1:50] 34.4 35.2 35.2 35.2 36 36.8 36.8 36.8 36.8 37.6 ...
..$ Sum.Sepal.Length.2022  : num [1:50] 34.4 69.6 104.8 140 176 ...
..$ Sum.Length.Compare.2022: chr [1:50] "Good" "Good" "Good" "Good" ...
..- attr(*, ".internal.selfref")=<externalptr> 
$ versicolor:Classes ‘data.table’ and 'data.frame':  50 obs. of  16 variables:
..$ Sepal.Length           : num [1:50] 4.9 5 5 5.1 5.2 5.4 5.5 5.5 5.5 5.5 ...
..$ Sepal.Width            : num [1:50] 2.4 2 2.3 2.5 2.7 3 2.3 2.4 2.4 2.5 ...
..$ Petal.Length           : num [1:50] 3.3 3.5 3.3 3 3.9 4.5 4 3.8 3.7 4 ...
..$ Petal.Width            : num [1:50] 1 1 1 1.1 1.4 1.5 1.3 1.1 1 1.3 ...
..$ Species                : Factor w/ 3 levels "setosa","versicolor",..: 2 2 2 2 2 2 2 2 2 2 ...
..$ Sepal.Length.2019      : num [1:50] 4.9 5 5 5.1 5.2 5.4 5.5 5.5 5.5 5.5 ...
..$ category               : chr [1:50] "small" "small" "small" "small" ...
..$ Sepal.Length.2020      : num [1:50] 9.8 10 10 10.2 10.4 10.8 11 11 11 11 ...
..$ Sum.Sepal.Length.2020  : num [1:50] 9.8 19.8 29.8 40 10.4 21.2 32.2 51 62 43.2 ...
..$ Sum.Length.Compare.2020: chr [1:50] "Good" "Good" "Good" "Good" ...
..$ Sepal.Length.2021      : num [1:50] 19.6 20 20 20.4 20.8 21.6 22 22 22 22 ...
..$ Sum.Sepal.Length.2021  : num [1:50] 19.6 39.6 59.6 80 20.8 42.4 64.4 102 124 86.4 ...
..$ Sum.Length.Compare.2021: chr [1:50] "Good" "Good" "Good" "Good" ...
..$ Sepal.Length.2022      : num [1:50] 39.2 40 40 40.8 41.6 43.2 44 44 44 44 ...
..$ Sum.Sepal.Length.2022  : num [1:50] 39.2 79.2 119.2 160 41.6 ...
..$ Sum.Length.Compare.2022: chr [1:50] "Good" "Good" "Good" "Good" ...
..- attr(*, ".internal.selfref")=<externalptr> 
$ virginica :Classes ‘data.table’ and 'data.frame':  50 obs. of  16 variables:
..$ Sepal.Length           : num [1:50] 4.9 5.6 5.7 5.8 5.8 5.8 5.9 6 6 6.1 ...
..$ Sepal.Width            : num [1:50] 2.5 2.8 2.5 2.7 2.8 2.7 3 2.2 3 3 ...
..$ Petal.Length           : num [1:50] 4.5 4.9 5 5.1 5.1 5.1 5.1 5 4.8 4.9 ...
..$ Petal.Width            : num [1:50] 1.7 2 2 1.9 2.4 1.9 1.8 1.5 1.8 1.8 ...
..$ Species                : Factor w/ 3 levels "setosa","versicolor",..: 3 3 3 3 3 3 3 3 3 3 ...
..$ Sepal.Length.2019      : num [1:50] 4.9 5.6 5.7 5.8 5.8 5.8 5.9 6 6 6.1 ...
..$ category               : chr [1:50] "big" "big" "big" "big" ...
..$ Sepal.Length.2020      : num [1:50] 9.8 11.2 11.4 11.6 11.6 11.6 11.8 12 12 12.2 ...
..$ Sum.Sepal.Length.2020  : num [1:50] 9.8 21 32.4 44 55.6 ...
..$ Sum.Length.Compare.2020: chr [1:50] "Good" "Good" "Good" "Good" ...
..$ Sepal.Length.2021      : num [1:50] 19.6 22.4 22.8 23.2 23.2 23.2 23.6 24 24 24.4 ...
..$ Sum.Sepal.Length.2021  : num [1:50] 19.6 42 64.8 88 111.2 ...
..$ Sum.Length.Compare.2021: chr [1:50] "Good" "Good" "Good" "Good" ...
..$ Sepal.Length.2022      : num [1:50] 39.2 44.8 45.6 46.4 46.4 46.4 47.2 48 48 48.8 ...
..$ Sum.Sepal.Length.2022  : num [1:50] 39.2 84 129.6 176 222.4 ...
..$ Sum.Length.Compare.2022: chr [1:50] "Good" "Good" "Good" "Good" ...
..- attr(*, ".internal.selfref")=<externalptr>

其他说明

函数manipulateDT()有两个额外的参数beg_yrend_yr,因为将所有相关参数传递给函数而不依赖于调用环境中的变量是一种良好的编程实践
  • data.table语法中,行
    x[, (cur_yr) := get(pre_yr) * 2]
    
    或者可以写成
    set(x, , cur_yr, x[, ..pre_yr] * 2)
    
    x[, (cur_yr) := lapply(.SD, `*`, 2), .SDcols = pre_yr]
    
  • 替代方法

    稍后完成

    最新更新