将多个分类列编码为R中的数字列

  • 本文关键字:数字 编码 分类 r dataframe
  • 更新时间 :
  • 英文 :


我有一个这样的数据帧:

标准类
订单状态 运输模式
完成
挂起First Class
待付款当天
已关闭二等

您可以通过以正确的顺序提供levels=并同时强制它们as.numeric来将变量转换为factor

dat <- transform(dat, 
order_stat_num=as.numeric(
factor(order_stat, 
levels=c("Complete", "Pending", "Closed", "Pending Payment"))),
shipping_mode_num=as.numeric(
factor(shipping_mode, 
levels=c("Standard Class", "First Class", "Same day", "Second class")))
)
dat
#         order_stat  shipping_mode order_stat_num shipping_mode_num
# 1         Complete Standard Class              1                 1
# 2           Closed    First Class              3                 2
# 3         Complete   Second class              1                 4
# 4          Pending    First Class              2                 2
# 5         Complete    First Class              1                 2
# 6           Closed Standard Class              3                 1
# 7           Closed   Second class              3                 4
# 8  Pending Payment Standard Class              4                 1
# 9           Closed   Second class              3                 4
# 10          Closed   Second class              3                 4


数据:

dat <- structure(list(order_stat = c("Complete", "Closed", "Complete", 
"Pending", "Complete", "Closed", "Closed", "Pending Payment", 
"Closed", "Closed"), shipping_mode = c("Standard Class", "First Class", 
"Second class", "First Class", "First Class", "Standard Class", 
"Second class", "Standard Class", "Second class", "Second class"
)), row.names = c(NA, -10L), class = "data.frame")
library(dplyr)
categorical_col = c("Order status","Shipping mode")
data[,categorical_col] = lapply(data[categorical_col], factor)

如果值没有被调平,您可以使用dplyr::dense_rank也可以使用

library(dplyr)
df %>% mutate(across(everything(), ~dense_rank(.)))
Orderstatus Shippingmode
1           2            4
2           3            1
3           4            2
4           1            3

类似的

df %>% mutate(across(everything(), ~as.numeric(as.factor(.))))
Orderstatus Shippingmode
1           2            4
2           3            1
3           4            2
4           1            3

dput使用的

df <- read.table(text = "Orderstatus    Shippingmode
1 Complete  'Standard Class'
2 Pending   'First Class'
3 'Pending Payment' 'Same day'
4 Closed    'Second class'", header = T)

最新更新