r语言 - 循环或遍历表以自动执行绑定行过程



我有以下表格:

df <- tribble(~name,                                               ~label,     ~year,    ~id,
"base1.dta", "Generated biographical information",                1990,  "gbi",
"base2.dta", "Generated biographical information",                1991,  "gbi",
"base3.dta", "Generated biographical information",                1992,  "gbi",
"base4.dta", "Generated biographical information",                1993,  "gbi",
"base5.dta", "Data on children from household questionnaire",     1990, "dchq",
"base6.dta", "Data on children from household questionnaire",     1991, "dchq",
"base7.dta", "Data on children from household questionnaire",     1992, "dchq",
"base8.dta", "Data on children from household questionnaire",     1993, "dchq",
"base9.dta", "Data from individual questionnaires",                1990,  "diq",
"base10.dta", "Data from individual questionnaires",               1991,  "diq",
"base11.dta", "Data from individual questionnaires",               1992,  "diq",
"base12.dta", "Data from individual questionnaires",               1993,  "diq")

name列中包含的数据帧都在我的项目的相同路径中,名称与df中相同。我想以以下方式循环或purrr这个表(当然更长):如果它们在label列中具有相同的值,则搜索由name列和bind_rows提供的所有这些数据帧的相应名称,并将它们分配给一个名为id的数据帧。然后,我要将那些以id命名的对象保存为.rds,放在另一个路径中。

考虑到您的labelid列都以相同的模式重复,并且您希望输出用id标记,您可以忽略label

您也不需要purrr-只需按idname分组,读入数据,然后与summarise绑定行。

使用@Serkan的data_testid列。

library(tidyverse)
data_test %>% 
group_by(id, name) %>% 
summarise(df = list(read.csv(name))) %>% 
summarise(joined = list(bind_rows(df)))
id    joined        
<chr> <list>        
1 iri   <df [300 × 5]>
2 mtc   <df [64 × 11]>

写入Rds时,可以先按id分组,再按write_rds分组。

... %>% 
group_by(id_) %>% 
group_walk(~write_rds(.x$joined, paste0(.y$id_, ".rds")))

数据
data_test <- tribble(
~name, ~label, ~id,
"mtcars_1.csv", "mtcars", "mtc",
"mtcars_2.csv", "mtcars", "mtc",
"iris_1.csv", "iris", "iri",
"iris_2.csv", "iris", "iri"
)

我复制了您的data.frame,并保存了mtcarsiris两次。为了使这个过程自动化,你可以从split开始你的data.framelabel,我假设你想要bind_rows

然后我使用嵌套的map来读取data.frame给出的path,称为df(在我的示例中为data_test),并使用read.table

显然你可以使用任何类型的数据加载函数。

data_test <- tribble(
~name, ~label,
"mtcars_1.csv", "mtcars",
"mtcars_2.csv", "mtcars",
"iris_1.csv", "iris",
"iris_2.csv", "iris"
)

data_test %>% split(
f = .$label
) %>% map(
.f = function(x) {

x$name %>% map(.f = function(x){

read.table(x)

}

) %>% reduce(bind_rows)

}
)

这将加载由labelbind_rows分组的name变量下给出的所有data.frame

编辑:正如@Anoushiravan指出的,你可以用haven::read_dta(x)代替read.table来加载stata的数据!

最新更新