我有以下数据框架:
CustomerID Department Price SportswearDemand HomeDemand KidswearDemand WomenswearDemand
-------------------------------------------------------------------------------------------
1050091 Sportswear 497.6 0 0 0 0
1555018 Womenswear 336.0 0 0 0 0
210239 Womenswear 698.0 0 0 0 0
507556 Sportswear 209.0 0 0 0 0
1708193 Sportswear 209.0 0 0 0 0
1295733 Menswear 209.0 0 0 0 0
1213373 Sportswear 298.0 0 0 0 0
753471 Sportswear 209.0 0 0 0 0
82739 Menswear 349.0 0 0 0 0
1660995 Kidswear 424.6 0 0 0 0
.
.
.
所有的列,包括SportswearDemand
的右侧,我将从现在开始称为"需求列"。我想根据Department
和Price
的信息填充这些,如下所示:
如果,对于某个customerID
, Department包含条目Sportswear
,那么我希望将该行的价格输入SportswearDemand
,而不是当前的零。其他需求列也是一样。最终的结果应该是这样的:
CustomerID Department Price SportswearDemand HomeDemand KidswearDemand WomenswearDemand
-------------------------------------------------------------------------------------------
1050091 Sportswear 497.6 497.6 0 0 0
1555018 Womenswear 336.0 0 0 0 336.0
210239 Womenswear 698.0 0 0 0 698.0
507556 Sportswear 209.0 209.0 0 0 0
1708193 Sportswear 209.0 209.0 0 0 0
1295733 Menswear 209.0 0 0 0 0
1213373 Sportswear 298.0 298.0 0 0 0
753471 Sportswear 209.0 209.0 0 0 0
82739 Menswear 349.0 0 0 0 0
1660995 Kidswear 424.6 0 0 424.6 0
.
.
.
我已经设法这样解决了:
df$SportswearDemand <- with(df, ifelse(df$Department == "Sportswear", df$Price, 0))
df$HomeDemand <- with(df, ifelse(df$Department == "Home", df$Price, 0))
df$KidswearDemand <- with(df, ifelse(df$Department == "Kidswear", df$Price, 0))
df$WomenswearDemand <- with(df, ifelse(df$Department == "Womenswear", df$Price, 0))
然而,我有30多个这样的需求列,我想知道是否有比硬编码这样的30行更好的方法来做到这一点?
我的第一个想法是像这样将一行封装在for循环中:
DemandColumns # array of all the 30 different demand columns stored as strings
for (i in DemandColumns){
df$i <- with(df, ifelse(df$Department == substr(i,1,nchar(i)-6), df$Price, 0))
}
但它只是添加了一个充满零的列"i"
。substr
用于获取字符串"Demand"
以外的所有字符。如有任何帮助,不胜感激。
无需初始化"需求列",请先删除。
df[grep('Demand', names(df))] <- NULL
创建Price
和Department
列的副本,并以宽格式获取数据。
library(dplyr)
library(tidyr)
df %>%
mutate(value = Price,
name = Department) %>%
pivot_wider(names_from = name, values_from = value,
names_glue = '{name}_Demand', values_fill = 0)
# CustomerID Department Price Sportswear_Demand Womenswear_Demand Menswear_Demand Kidswear_Demand
# <int> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1050091 Sportswear 498. 498. 0 0 0
# 2 1555018 Womenswear 336 0 336 0 0
# 3 210239 Womenswear 698 0 698 0 0
# 4 507556 Sportswear 209 209 0 0 0
# 5 1708193 Sportswear 209 209 0 0 0
# 6 1295733 Menswear 209 0 0 209 0
# 7 1213373 Sportswear 298 298 0 0 0
# 8 753471 Sportswear 209 209 0 0 0
# 9 82739 Menswear 349 0 0 349 0
#10 1660995 Kidswear 425. 0 0 0 425.