Goal
导入并绑定具有不可预测的不相关第一列的 XLSX 文件列表。 (必须消除它,但您不知道哪个文件包含不相关的第一列。
# sample : remind that the xlsx file originally don't have any column names
DT1 = data.table(a = c("TYPE","X","Y","Z"), b = c("MONTH","Jan","Feb","Mar"))
DT2 = data.table(a = c(NA,NA,NA,"random_irrelevant_vale") ,b = c("TYPE","X","Y","W"), c = c("MONTH","Apr","Feb","May"))
# give "idcol" & col_names = FALSE is a must
PATH_LIST<- list.files(path = "PATH",
pattern = "*.xlsx",full.names = TRUE)
names(PATH_LIST) <- basename(PATH_LIST)
rbindlist(lapply(PATH_LIST,import,col_names = FALSE,
col_types = "text"),
idcol = "source",fill = TRUE)
# EXPECTED OUTPUT
TYPE MONTH
X Jan
Y Feb
Z Mar
X Apr
Y Feb
W May
- 不能简单地使用 !grepl("random_irrevelevant_value",x( 因为 "random_irrevelevant_value" 是一个随机值。
如何在导入步骤和重新绑定它们之间实现这一点?
假设相关列位于后面,并且有一些文件没有不相关的列,您可以尝试如下操作:
DTls <- lapply(list.files(pattern="DT(.*).csv"), fread, header=FALSE)
m <- min(lengths(DTls))
rbindlist(
lapply(DTls, function(DT) {
cols <- head(names(DT), length(DT)-m)
if (length(cols) > 0)
DT[, (cols) := NULL]
DT
})
)
输出:
V1 V2
1: TYPE MONTH
2: X Jan
3: Y Feb
4: Z Mar
5: TYPE MONTH
6: X Apr
7: Y Feb
8: W May
数据文件:
library(data.table)
DT1 = data.table(a = c("TYPE","X","Y","Z"), b = c("MONTH","Jan","Feb","Mar"))
DT2 = data.table(a = c(NA,NA,NA,"random_irrelevant_vale") ,b = c("TYPE","X","Y","W"), c = c("MONTH","Apr","Feb","May"))
fwrite(DT1, "DT1.csv", col.names=FALSE)
fwrite(DT2, "DT2.csv", col.names=FALSE)