将相同的因子水平应用于R中具有不同水平的多个变量



我有一个带有168 variables8,278 observationsdata.table。变量69:135最初存储为字符串。他们应该成为地区的假人,我想最终达到2级(=是,公司在这里运营(和1级(=否,公司不在这里运营。(。问题是在原始变量中存在三种不同的输入组合:1(";如果"真"1〃"0"如果为"假";,2( ";如果"真"如果为"假";,和3(";1〃"0〃;。此外,大约5个变量只有一个值,或者"0";0";或";1〃;。这里给出了一个例子:

#generating replicable data
structure(list(
region1 = structure(c("TRUE", "FALSE", "0", "1", NA), class = "character"), 
region2 = structure(c("1", "1", "0", NA, NA), class = "character"), 
region3 = structure(c(NA, "FALSE", "TRUE", NA, "FALSE"), class = "character"),
region4 = structure(c(NA, "0", "0", NA, "0"), class = "character")),
.Names = c("region1", "region2", "region3", "region4"), row.names = c(NA, 5), class = "data.table")
#this gives:
#   region1 region2 region3 region4
#1    TRUE       1    <NA>    <NA>
#2   FALSE       1   FALSE       0
#3       0       0    TRUE       0
#4       1    <NA>    <NA>    <NA>
#5    <NA>    <NA>   FALSE       0                                                                                      

我正在寻找一种方法来取代";TRUE";以及";1〃;乘以2;FALSE";以及";0";一次为所有变量乘以1。因此,期望的结果是:

#   region1 region2 region3 region4
#1:       2       2      NA      NA
#2:       1       2       1       1
#3:       1       1       2       1
#4:       2      NA      NA      NA
#5:      NA      NA       1       1

我已经看过了

将因子级别应用于缺少因子级别的多列和更改多因素变量的级别。

然而,这对我没有帮助。

我使用嵌套的ifelse()命令尝试了以下操作:

library(data.table)
library(forcats)
check <- cbind(dt[1:68], as.data.table(apply(dt[69:135], 2, function(x) {
ifelse("1" %in% x & "TRUE" %in% x,
fct_collapse(x,
"2" = c("TRUE",
"1"),
"1" = c("FALSE",
"0")
),
ifelse("1" %in% x & !("TRUE" %in% x),
fct_collapse(x,
"2" = "1",
"1" = "0"),
fct_collapse(x,
"2" = "TRUE",
"1" = "FALSE"
)))
}
)), dt[136:168]) 

但是前面的代码并没有给我想要的结果。它一直在运行,但我收到了一条警告消息,当检查各个变量时,它们仍然以字符串的形式存储在原始输入中。

# examples of warnings
1: Unknown levels in `f`: TRUE, FALSE
2: Unknown levels in `f`: TRUE, FALSE
3: Unknown levels in `f`: TRUE, FALSE
4: Unknown levels in `f`: 0
5: Unknown levels in `f`: TRUE, FALSE
6: Unknown levels in `f`: TRUE, FALSE
7: Unknown levels in `f`: 0

嵌套的ifelse()命令本身以及在不与fct_collapse组合时起作用:

#the ifelse statement works
ifelse("TRUE" %in% dt$region1, 2, "FALSE")
ifelse(5 %in% dt$region1, 2, "FALSE")
#also the nested ifelse statement works
ifelse("1" %in% dt$region1 & "TRUE" %in% dt$region1,
0,
ifelse("1" %in% dt$region1 & !("TRUE" %in% dt$region1),
1,
2
))

ifelse("1" %in% dt$region2 & "TRUE" %in% dt$region2,
0,
ifelse("1" %in% dt$region2 & !("TRUE" %in% dt$region2),
1,
2
))

有人知道如何解决这个问题吗?

非常感谢您提前提供的任何建议!

以下是在for循环中调用set()的方法。

library(data.table)
f <- function(x){
x <- as.character(x)
i1 <- x %in% c("TRUE", "1")
i0 <- x %in% c("FALSE", "0")
x[which(i1)] <- "2"
x[which(i0)] <- "1"
as.integer(x)
}
for (j in seq_along(dt)) set(dt, j = j, value = f(dt[[j]]))
dt
#   region1 region2 region3 region4
#1:       2       2      NA      NA
#2:       1       2       1       1
#3:       1       1       2       1
#4:       2      NA      NA      NA
#5:      NA      NA       1       1

感谢jangorecki的评论,是一种更简单的方法

dt[, names(dt) := lapply(dt, f)]

最新更新