r语言 - 使用for循环从多个数据帧中提取向量



我有50个数据帧,每个数据帧的结构相同(每个数据帧有六个变量,不超过300行)。我需要从每个数据帧中提取和转换两个向量,它们对应于一些行,但不是所有行。

所以从每个数据集中我提取两列,然后我从这些列中提取几行。

所有50个这些向量被绑定到一个矩阵中,随后用于网络分析(一个引用矩阵,为了它的价值——所以是一个有向图)。

下面的代码实现了这个提取和转换。

library(tidyverse)
# read the original .csv file and extract the relevant
# vectors
SOME_JOURNAL <- read_csv("SOME_JOURNAL.csv") %>%
select(X3, X4) %>% 
rename("journal" = X3,
"citations" = X4) %>%
mutate(citations = as.integer(citations)) %>% 
na.omit() %>% 
tail(-3)
# identify the specfiic rows I want to extract
extract_list <- sort(c("SOME_JOURNAL",
"ANOTHER_JOURNAL",
"YET_ANOTHER_JOURNAL",
"ONE_MORE_JOURNAL"))
# extract the rows
SOME_JOURNAL <- SOME_JOURNAL %>% 
filter(!!sym(names(.)[1]) %in% extract_list) %>% 
# filters out the items I want  
add_row(journal = setdiff(extract_list, SOME_JOURNALL$journal), citations = 0) %>% 
# adds rows for which there is no data and assigns them zeros
arrange(journal) %>%
# need things in alphabetical order to manage things later on
pivot_wider(names_from = journal, values_from = citations) 
# transposes the vector so that I can bind it with other vectors as a matrix
# for a directed graph
# make another adjustment to help transforming the matrix into a graph
SOME_JOURNAL <- data.frame(SOME_JOURNAL, row.names = "SOME_JOURNAL")
# create thee matrix by binding extracted vectors
matrix <- as.matrix(rbind(SOME_JOURNAL,
ANOTHER_JOURNAL,
YET_ANOTHER_JOURNAL,
ONE_MORE_JOURNAL))

由reprex包(v0.3.0)创建于2021-02-26

假设我有50个这样的数据帧,我想将其自动化。我遇到了一个障碍(主要是因为我是一个新手)。下面的代码导致"$操作符对原子向量无效";错误。我试过使用[和[],但我不知道这是否是一种可能有用的解决方案。

如有任何帮助,不胜感激。

library(tidyverse)
# get a list of all the filenames
filenames <- list.files(path="data/",
pattern=".*csv")
# for loop to read files and extract vectors 
for(i in filenames){
filepath <- file.path("data/", paste(i))
short_name <-str_replace_all(str_remove_all(i,
"#.*"), "-", "_")
# the data frames have very long names; this just shortens them
assign(short_name, read_csv(filepath) %>% 
select(X3, X4) %>% 
rename("journal" = X3,
"citations" = X4) %>%
mutate(citations = as.integer(citations)) %>% 
na.omit() %>% 
tail(-2) %>%
filter(!!sym(names(.)[1]) %in% extract_list) %>% 
# everything works fine to this point; the code after produces 
# the "$ operator is invalid for atomic vectors" error
add_row(journal = setdiff(extract_list, SOME_JOURNAL$journal), 
citations = 0) 

)  
}

由reprex包(v0.3.0)创建于2021-02-26

我不能把这个解决方案归功于我,但是解决这个问题的人很乐意让我把它贴出来供后人使用。

引用求解器,"一般的想法是将长管道分成几个步骤,这样setdiff()步骤可以访问数据框架中的日志向量"。

可能有其他的方法,但这对我的目的是有效的。

for(i in filenames){
filepath <- file.path("data/", paste(i))
short_name <-str_replace_all(str_remove_all(i,
"#.*"), "-", "_")
# the data frames have very long names; this just shortens them
df <- read_csv(filepath) %>% 
select(X3, X4) %>% 
rename("journal" = X3,
"citations" = X4) %>%
mutate(citations = as.integer(citations)) %>% 
na.omit() %>% 
tail(-2) %>%
filter(!!sym(names(.)[1]) %in% extract_list)
df_all_rows <- df %>%
add_row(journal = setdiff(extract_list, df$journal), 
citations = 0) %>% 
arrange(journal) %>% 
pivot_wider(names_from = journal, values_from = citations)
df_all_rows <- data.frame(df_all_rows, row.names = short_name)
assign(short_name, df_all_rows)
}
#> Error in eval(expr, envir, enclos): object 'filenames' not found

由reprex包(v0.3.0)创建于2021-02-26

最新更新