我有这个数据集,我需要将1列分成2个变量-风味和类型:
| column_1 | count |
| ------------------ | ----- |
| total mango juice | 01 |
| orange juice | 02 |
| strawberry jam | 09 |
| total strawberry | 06 |
| strawberry jelly | 05 |
| total jelly | 04 |
我希望这个表是这样的:
| flavour | type | count |
| -----------| ------ | ----- |
| mango | juice | 01 |
| orange | juice | 02 |
| strawberry | jam | 09 |
| strawberry | N/A | 06 |
| strawberry | jelly | 05 |
| N/A | jelly | 04 |
我试着四处寻找解决方案,并在R工作室尝试,但无济于事。然而,我们需要使用regex拆分列.
这是否也意味着我必须定义新的变量?
我们可以在创建分隔符并删除'total'后使用separate
library(dplyr)
library(tidyr)
library(stringr)
df1 %>%
mutate(column_1 = str_remove(str_replace(column_1,
"(.*)\s+(juice|jelly|jam)$", "\1,\2"), '^total\s*')) %>%
separate(column_1, into = c('flavour', 'type'))
与产出
flavour type count
1 mango juice 1
2 orange juice 2
3 strawberry jam 9
4 strawberry <NA> 6
5 strawberry jelly 5
6 jelly 4
数据df1 <- structure(list(column_1 = c("total mango juice", "orange juice",
"strawberry jam", "total strawberry", "strawberry jelly", "total jelly"
), count = c(1L, 2L, 9L, 6L, 5L, 4L)), class = "data.frame", row.names = c(NA,
-6L))