我有以下表格:
df <- tribble(~name, ~label, ~year, ~id,
"base1.dta", "Generated biographical information", 1990, "gbi",
"base2.dta", "Generated biographical information", 1991, "gbi",
"base3.dta", "Generated biographical information", 1992, "gbi",
"base4.dta", "Generated biographical information", 1993, "gbi",
"base5.dta", "Data on children from household questionnaire", 1990, "dchq",
"base6.dta", "Data on children from household questionnaire", 1991, "dchq",
"base7.dta", "Data on children from household questionnaire", 1992, "dchq",
"base8.dta", "Data on children from household questionnaire", 1993, "dchq",
"base9.dta", "Data from individual questionnaires", 1990, "diq",
"base10.dta", "Data from individual questionnaires", 1991, "diq",
"base11.dta", "Data from individual questionnaires", 1992, "diq",
"base12.dta", "Data from individual questionnaires", 1993, "diq")
name列中包含的数据帧都在我的项目的相同路径中,名称与df中相同。我想以以下方式循环或purrr这个表(当然更长):如果它们在label列中具有相同的值,则搜索由name列和bind_rows提供的所有这些数据帧的相应名称,并将它们分配给一个名为id的数据帧。然后,我要将那些以id命名的对象保存为.rds,放在另一个路径中。
考虑到您的label
和id
列都以相同的模式重复,并且您希望输出用id
标记,您可以忽略label
。
您也不需要purrr
-只需按id
和name
分组,读入数据,然后与summarise
绑定行。
使用@Serkan的data_test
和id
列。
library(tidyverse)
data_test %>%
group_by(id, name) %>%
summarise(df = list(read.csv(name))) %>%
summarise(joined = list(bind_rows(df)))
id joined
<chr> <list>
1 iri <df [300 × 5]>
2 mtc <df [64 × 11]>
写入Rds时,可以先按id
分组,再按write_rds
分组。
... %>%
group_by(id_) %>%
group_walk(~write_rds(.x$joined, paste0(.y$id_, ".rds")))
数据data_test <- tribble(
~name, ~label, ~id,
"mtcars_1.csv", "mtcars", "mtc",
"mtcars_2.csv", "mtcars", "mtc",
"iris_1.csv", "iris", "iri",
"iris_2.csv", "iris", "iri"
)
我复制了您的data.frame
,并保存了mtcars
和iris
两次。为了使这个过程自动化,你可以从split
开始你的data.frame
到label
,我假设你想要bind_rows
。
然后我使用嵌套的map
来读取data.frame
给出的path
,称为df(在我的示例中为data_test
),并使用read.table
。
显然你可以使用任何类型的数据加载函数。
data_test <- tribble(
~name, ~label,
"mtcars_1.csv", "mtcars",
"mtcars_2.csv", "mtcars",
"iris_1.csv", "iris",
"iris_2.csv", "iris"
)
data_test %>% split(
f = .$label
) %>% map(
.f = function(x) {
x$name %>% map(.f = function(x){
read.table(x)
}
) %>% reduce(bind_rows)
}
)
这将加载由label
和bind_rows
分组的name
变量下给出的所有data.frame
。
编辑:正如@Anoushiravan指出的,你可以用haven::read_dta(x)
代替read.table
来加载stata
的数据!