如何将范畴变量转换为数值变量

  • 本文关键字:变量 转换 范畴 r
  • 更新时间 :
  • 英文 :


我正在尝试将我所有的分类变量转换为数字变量,逻辑变量(是或否(似乎很容易,但有两个以上选项的分类变量对我来说更难。?如何将具有3个选项("非旅行"、"很少旅行"one_answers"频繁旅行"(的BusinessTravel变量更改为数字变量(0、1和2(?

#---- load the dataset ----#
library(readr)
Dataset <- read_csv("Dataset.csv")
#---- select the important variables ----#
new_df <- select(df, DistanceFromHome, MonthlyIncome, YearsAtCompany, 
Attrition, BusinessTravel, OverTime, JobInvolvement, 
StockOptionLevel, EnvironmentSatisfaction, JobLevel, 
Department)
#---- Transform the Variables ----#
new_df <- new_df %>%
mutate(Attrition = ifelse(Attrition == "No", 0, 1),
OverTime = ifelse(OverTime == "No", 0, 1))
BusinessTravel ??? 

试试这个方法,它将所有变量转换为因子(R在幕后指定一个数字(,然后提取潜在的数值:

数据

df <- data.frame(DistanceFromHome = c("Close", "Far", "Medium", "Far", "Close"),
MonthlyIncome = c("<1000", "1000-5000", "<1000", ">5000", "1000-5000"))

代码

convert_cols <- 1:ncol(df)
df[convert_cols] <- lapply(df[convert_cols], function(x) as.numeric(as.factor(x)))

输出:

#   DistanceFromHome MonthlyIncome
# 1                1             1
# 2                2             3
# 3                3             1
# 4                2             2
# 5                1             3

由于你没有提供样本数据,我不知道这个确切的代码是否有效,但你可以将convert_cols更改为你想从中提取数字的任何列(这里我只做了所有(

Data.matrix((可能会完成这项工作。借用jpsmith的解决方案中的示例:

df <- data.frame(DistanceFromHome = c("Close", "Far", "Medium", "Far", "Close"),
MonthlyIncome = c("<1000", "1000-5000", "<1000", ">5000", "1000-5000"))
data.matrix(df)

提供:

> data.matrix(df)
DistanceFromHome MonthlyIncome
[1,]                1             1
[2,]                2             3
[3,]                3             1
[4,]                2             2
[5,]                1             3

最新更新