我很难整理这些数据。糕点。烘焙栏是因素,我希望能够通过糕点搜索。为\S+[X]\S+
(value>1)的所有迭代生成的列,将value更改为1,并用该值复制行。
df <- data.frame(bakery.ID = 1:4, pastries.Baked = c('Large Cake x 3', 'Large Cake x 1', 'Large Cake x 2', 'Medium Cake x 1'))
所需输出
baker . id | pastriys . baked | 1 | 大蛋糕x 1 |
---|---|---|
1 | 大蛋糕x 1 | |
1 | 大蛋糕x 1 | |
2 | 大蛋糕x 1 | |
3 | 大蛋糕x 1 | |
3 | 大蛋糕x 1 | |
4 | 中饼x 1 |
使用separate
,您可以在两个不同的列中拆分文本和数字,并使用uncount
根据数字重复行。
library(dplyr)
library(tidyr)
df %>%
separate(pastries.Baked, c('pastries.Baked', 'count'), sep = '\s*x\s*', convert = TRUE) %>%
uncount(count) %>%
mutate(pastries.Baked = paste(pastries.Baked, 'x 1'))
# bakery.ID pastries.Baked
#1 1 Large Cake x 1
#2 1 Large Cake x 1
#3 1 Large Cake x 1
#4 2 Large Cake x 1
#5 3 Large Cake x 1
#6 3 Large Cake x 1
#7 4 Medium Cake x 1