r语言 - 将一个函数应用于环境中的所有数据帧



我想在我的环境中对所有数据帧使用下面的cleanfunction

cleanfunction <- function(dataframe) {
dataframe <- as.data.frame(dataframe)
## get mode of all vars
var_mode <- sapply(dataframe, mode)
## produce error if complex or raw is found
if (any(var_mode %in% c("complex", "raw"))) stop("complex or raw not allowed!")
## get class of all vars
var_class <- sapply(dataframe, class)
## produce error if an "AsIs" object has "logical" or "character" mode
if (any(var_mode[var_class == "AsIs"] %in% c("logical", "character"))) {
stop("matrix variables with 'AsIs' class must be 'numeric'")
}
## identify columns that needs be coerced to factors
ind1 <- which(var_mode %in% c("logical", "character"))
## coerce logical / character to factor with `as.factor`
dataframe[ind1] <- lapply(dataframe[ind1], as.factor)
return(dataframe)
}
set.seed(10238)
DT = data.table(
A = rep(1:3, each = 5L), 
B = rep(1:5, 3L),
C = sample(15L),
D = sample(15L)
)
DT_II <- copy(DT)
dfs <- ls()

现在我想把这个函数应用到环境中的所有df上。我已经尝试了十种方法,但是我不能得到正确的语法。

for (i in seq_along(dfs)) {
get(dfs[i])[ , lapply(.SD, cleanfunction)]
}
<标题>

编辑:我找到了这个解决方案,但是它不存储结果。

eapply(globalenv(), function(x) if (is.data.frame(x)) cleanfunction(x))

如何在每个对象中存储结果?

您的get(dfs[i])返回对data.table的引用,但是然后您是lapply-该框架的每一列,我从函数参数dataframe推断您期望一个完整的帧。可以这样开头:

for (i in seq_along(dfs)) {
get(dfs[i])[ , cleanfunction(.SD)]
}

但是要知道这个操作返回一个新的帧,它没有使用规范的data.table机制来就地更新数据。我建议你更新你的函数,总是强制data.table,并参考它工作。

cleanfunction <- function(dataframe) {
setDT(dataframe)
## get mode of all vars
var_mode <- sapply(dataframe, mode)
## produce error if complex or raw is found
if (any(var_mode %in% c("complex", "raw"))) stop("complex or raw not allowed!")
## get class of all vars
var_class <- sapply(dataframe, class)
## produce error if an "AsIs" object has "logical" or "character" mode
if (any(var_mode[var_class == "AsIs"] %in% c("logical", "character"))) {
stop("matrix variables with 'AsIs' class must be 'numeric'")
}
## identify columns that needs be coerced to factors
ind1 <- which(var_mode %in% c("logical", "character"))
## coerce logical / character to factor with `as.factor`
if (length(ind1)) dataframe[, c(ind1) := lapply(.SD, as.factor), .SDcols = ind1]
return(dataframe)
}

由于您当前的数据不会触发任何更改,我将更新一个:

DT[,quux:="A"]
head(DT)
#        A     B     C     D   quux
#    <int> <int> <int> <int> <char>
# 1:     1     1    12    15      A
# 2:     1     2     4     6      A
# 3:     1     3     5     7      A
# 4:     1     4     9     1      A
# 5:     1     5     6    14      A
# 6:     2     1    15    13      A
for (i in seq_along(dfs)) cleanfunction(get(dfs[i]))
head(DT)
#        A     B     C     D   quux
#    <int> <int> <int> <int> <fctr>
# 1:     1     1    12    15      A
# 2:     1     2     4     6      A
# 3:     1     3     5     7      A
# 4:     1     4     9     1      A
# 5:     1     5     6    14      A
# 6:     2     1    15    13      A

注意,for循环完全依赖于引用更新;这里忽略cleanfunction的返回值。

由于data.table引用语义,该方法完全工作;如果您使用data.frametbl_df,这可能需要用assign(dfs[i], cleanfunction(..))包装对cleanfunction(.)的调用。

这对你有用吗?:

# store all dataframes from environment a list
dfs <- Filter(function(x) is(x, "data.frame"), mget(ls()))
#then apply your function
lapply(dfs, cleanfunction)

最新更新