我正在尝试将我所有的分类变量转换为数字变量,逻辑变量(是或否(似乎很容易,但有两个以上选项的分类变量对我来说更难。?如何将具有3个选项("非旅行"、"很少旅行"one_answers"频繁旅行"(的BusinessTravel变量更改为数字变量(0、1和2(?
#---- load the dataset ----#
library(readr)
Dataset <- read_csv("Dataset.csv")
#---- select the important variables ----#
new_df <- select(df, DistanceFromHome, MonthlyIncome, YearsAtCompany,
Attrition, BusinessTravel, OverTime, JobInvolvement,
StockOptionLevel, EnvironmentSatisfaction, JobLevel,
Department)
#---- Transform the Variables ----#
new_df <- new_df %>%
mutate(Attrition = ifelse(Attrition == "No", 0, 1),
OverTime = ifelse(OverTime == "No", 0, 1))
BusinessTravel ???
试试这个方法,它将所有变量转换为因子(R在幕后指定一个数字(,然后提取潜在的数值:
数据
df <- data.frame(DistanceFromHome = c("Close", "Far", "Medium", "Far", "Close"),
MonthlyIncome = c("<1000", "1000-5000", "<1000", ">5000", "1000-5000"))
代码
convert_cols <- 1:ncol(df)
df[convert_cols] <- lapply(df[convert_cols], function(x) as.numeric(as.factor(x)))
输出:
# DistanceFromHome MonthlyIncome
# 1 1 1
# 2 2 3
# 3 3 1
# 4 2 2
# 5 1 3
由于你没有提供样本数据,我不知道这个确切的代码是否有效,但你可以将convert_cols
更改为你想从中提取数字的任何列(这里我只做了所有(
Data.matrix((可能会完成这项工作。借用jpsmith的解决方案中的示例:
df <- data.frame(DistanceFromHome = c("Close", "Far", "Medium", "Far", "Close"),
MonthlyIncome = c("<1000", "1000-5000", "<1000", ">5000", "1000-5000"))
data.matrix(df)
提供:
> data.matrix(df)
DistanceFromHome MonthlyIncome
[1,] 1 1
[2,] 2 3
[3,] 3 1
[4,] 2 2
[5,] 1 3