我有一个数据帧,其中有一列("name"(包含水果的名称:
name
Apple
Apple
Mango
Banana
Banana
Orange
Mango
Orange
等等。我的数据中有9个水果
我想按照命名规则"创建新的变量;名称_"数据";。因此,我想再添加9个变量,例如:
name name_Apple name_Mango name_Banana name_Orange
Apple 1 0 0 0
Apple 1 0 0 0
Mango 0 1 0 0
Banana 0 0 1 0
Banana 0 0 1 0
Orange 0 0 0 1
Mango 0 1 0 0
Orange 0 0 0 1
我想使用for循环来实现这一点,因为数据将被添加到现有的帧中。我试过这个:
name_list <- c("Apple", "Mango", "Banana", "Orange)
for (i in name_list) {
df_main$name_[[i]] <- ifelse(df_main$name == [[i]], 1, 0)
}
我得到错误";错误:意外的"[["。我认为我在循环中引用了错误的字符数据,但不知道如何正确执行。mutate((在这里会更好用吗?
我们可以从fastDummies
使用dummy_cols
library(fastDummies)
df1 %>%
dummy_cols('name')
-输出
name name_Apple name_Banana name_Mango name_Orange
1 Apple 1 0 0 0
2 Apple 1 0 0 0
3 Mango 0 0 1 0
4 Banana 0 1 0 0
5 Banana 0 1 0 0
6 Orange 0 0 0 1
7 Mango 0 0 1 0
8 Orange 0 0 0 1
数据
df1 <- structure(list(name = c("Apple", "Apple", "Mango", "Banana",
"Banana", "Orange", "Mango", "Orange")), class = "data.frame", row.names = c(NA,
-8L))
在基本R中,您可以执行:
mat <- outer(df$name, unique(df$name), function(a, b) as.numeric(a == b))
cbind(df, setNames(as.data.frame(mat), paste0('name_', unique(df$name))))
#> name name_Apple name_Mango name_Banana name_Orange
#> 1 Apple 1 0 0 0
#> 2 Apple 1 0 0 0
#> 3 Mango 0 1 0 0
#> 4 Banana 0 0 1 0
#> 5 Banana 0 0 1 0
#> 6 Orange 0 0 0 1
#> 7 Mango 0 1 0 0
#> 8 Orange 0 0 0 1
另一种方式:
model.matrix(~ name - 1, data = df)
# nameApple nameBanana nameMango nameOrange
# 1 1 0 0 0
# 2 1 0 0 0
# 3 0 0 1 0
# 4 0 1 0 0
# 5 0 1 0 0
# 6 0 0 0 1
# 7 0 0 1 0
# 8 0 0 0 1
数据:
structure(list(name = c("Apple", "Apple", "Mango", "Banana",
"Banana", "Orange", "Mango", "Orange")), class = "data.frame", row.names = c(NA,
-8L)) -> df