我有一个这样的数据帧:
订单状态 | 运输模式 | |
---|---|---|
完成 | ||
挂起 | First Class | |
待付款 | 当天 | |
已关闭 | 二等 |
您可以通过以正确的顺序提供levels=
并同时强制它们as.numeric
来将变量转换为factor
。
dat <- transform(dat,
order_stat_num=as.numeric(
factor(order_stat,
levels=c("Complete", "Pending", "Closed", "Pending Payment"))),
shipping_mode_num=as.numeric(
factor(shipping_mode,
levels=c("Standard Class", "First Class", "Same day", "Second class")))
)
dat
# order_stat shipping_mode order_stat_num shipping_mode_num
# 1 Complete Standard Class 1 1
# 2 Closed First Class 3 2
# 3 Complete Second class 1 4
# 4 Pending First Class 2 2
# 5 Complete First Class 1 2
# 6 Closed Standard Class 3 1
# 7 Closed Second class 3 4
# 8 Pending Payment Standard Class 4 1
# 9 Closed Second class 3 4
# 10 Closed Second class 3 4
数据:
dat <- structure(list(order_stat = c("Complete", "Closed", "Complete",
"Pending", "Complete", "Closed", "Closed", "Pending Payment",
"Closed", "Closed"), shipping_mode = c("Standard Class", "First Class",
"Second class", "First Class", "First Class", "Standard Class",
"Second class", "Standard Class", "Second class", "Second class"
)), row.names = c(NA, -10L), class = "data.frame")
library(dplyr)
categorical_col = c("Order status","Shipping mode")
data[,categorical_col] = lapply(data[categorical_col], factor)
如果值没有被调平,您可以使用dplyr::dense_rank
也可以使用
library(dplyr)
df %>% mutate(across(everything(), ~dense_rank(.)))
Orderstatus Shippingmode
1 2 4
2 3 1
3 4 2
4 1 3
类似的
df %>% mutate(across(everything(), ~as.numeric(as.factor(.))))
Orderstatus Shippingmode
1 2 4
2 3 1
3 4 2
4 1 3
dput使用的
df <- read.table(text = "Orderstatus Shippingmode
1 Complete 'Standard Class'
2 Pending 'First Class'
3 'Pending Payment' 'Same day'
4 Closed 'Second class'", header = T)