r语言 - 将数据集分配给数据帧列表



我每次通过迭代 4 年(2011、2013、2015、2017)的循环生成一组 6 个数据集,因此我总共将有 24 个数据集。我正在尝试使用分配粘贴将每个数据集的名称与相应的年份连接起来。 但是,我在循环结束时只得到 6 个数据集,而不是 6*4 =24。

我是否需要特殊的 [[]] 语法来创建数据框列表?为什么我无法将数据集分配给下面循环结构中的变量?

library(educationdata)
library(glue)
## Initialize lists
dates<-list("2011","2013","2015","2017")
frames<-list("df_ccdirectory","df_ccdenrollment","df_crdcteacher",
"df_crdcmathscience","df_crdcsat","df_crdcfinance")
dflist <- list()

for (j in dates){

df_ccdirectory <- get_education_data(level = "schools",
source = "ccd",
topic = "directory",
filters = list(year = j,fips=10),
add_labels = TRUE)
dflist[[1]]<- df_ccdirectory
df_ccdenrollment <- get_education_data(level = "schools",
source = "ccd",
topic = "enrollment",
filters = list(year = j,fips=10),
add_labels = TRUE)
dflist[[2]]<-   df_ccdenrollment
df_crdcteacher<- get_education_data(level = "schools",
source = "crdc",
topic = "teachers-staff",
filters = list(year = j,fips=10),
add_labels = TRUE)
dflist[[3]]<-    df_crdcteacher
df_crdcmathscience <- get_education_data(level = "schools",
source = "crdc",
topic = "math-and-science",
subtopic = c('race','sex'),
filters = list(year = j,fips=10),
add_labels = TRUE)
dflist[[4]]<- df_crdcmathscience
df_crdcsat <- get_education_data(level = "schools",
source = "crdc",
topic = "sat-act-participation",
subtopic = c('race','sex'),
filters = list(year = j,fips=10),
add_labels = TRUE)
dflist[[5]] <-df_crdcsat
df_crdcfinance <- get_education_data(level = "schools",
source = "crdc",
topic = "school-finance",
filters = list(year = j,fips=10),
add_labels = TRUE)
dflist[[6]]<-df_crdcfinance


## Error catching...
#print(dates[[j]],"n")
print(paste0("dataset 1"))
cat("n")
head(dflist[[1]])
cat("n")
print(paste0("dataset 6"))
cat("n")
head(dflist[[6]])
cat("n")
for (k in 1:6){
assign(paste(frames[k], dates[j], sep = ""), dflist[[k]])

}


}

考虑几个调整:

  1. 继续使用单个数据框列表,避免用许多单独的、结构相似的数据淹没全局环境。对于调试和副作用问题,assign应该很少在 R 中使用。相反,应为数据框列表指定名称。
  2. 使用更简化的应用族方法避免for循环的簿记,该方法隐藏循环并返回集合,在某些情况下,如下所示,sapply命名集合。此外,Map(包装到mapply)是家族的元素成员。
  3. 通过参数化get_education_data调用来保持代码干燥(Don'tRepeatY我们自己),该调用在主题参数上在六个调用之间有所不同。

调整后的代码

# USER-DEFINED PARAMETERIZED METHOD
build_df <- function(year_param, source_param, topic_param) {
get_education_data(
level = "schools",
source = source_param,
topic = topic_param,
filters = list(year = year_param, fips=10),
add_labels = TRUE
)
}
# INITIALIZE VECTORS
dates <- c("2011", "2013", "2015", "2017")
sources <- c("ccd", "ccd", "crdc", "crdc", "crdc", "crdc")
topics <- c(
"directory", "enrollment", "teachers-staff",
"math-and-science", "sat-act-participation",
"school-finance"
)
frames <- c(
"df_ccdirectory", "df_ccdenrollment", "df_crdcteacher",
"df_crdcmathscience", "df_crdcsat", "df_crdcfinance"
)
# RETURN NESTED YEARLY LIST OF DATA FRAMES
df_list <- sapply(
dates, 
function(dt) setNames(Map(build_df, dt, sources, topics), frames),
simplify = FALSE
)

输出

# ALL DATA FRAMES (N=24)
df_list$`2011`$df_ccdirectory
df_list$`2011`$df_ccdenrollment
...
df_list$`2017`$df_crdcsat
df_list$`2017`$df_crdcfinance

# ALL 2011 DATA FRAMES (N=6)
df_list$`2011`

# ALL ccdirectory DATA FRAMES (N=4)
lapply(df_list, `[[`, "df_ccdirectory")

最新更新