我想在R中拆分一个列,其中包含服装产品的Size, Color和customer的描述。
列看起来像这样
尺寸:AS -成人小,颜色:黑色,由:Elmwood销售尺寸:YS -青年小,颜色:黑色,销售:迷迭香棕色PS颜色-黑色,售出By: Rosemary Brown PS
所以,有些列有大小,有些以color
开头我需要做三列- 1。大小2。颜色3。销售的至于大小,我只需要两个字母大小的代码,颜色-需要颜色描述For Sold By:我需要解析它,直到它找到任何"逗号">
谢谢你的帮助
对于您给出的特定示例,这应该可以达到目的:
library(tidyverse)
df <- tribble(
~col1,
'Size: AS - Adult Small, Colours: Black , Sold By: Elmwood , Some Other Column: Other Data ',
'Size:YS - Youth Small, Colours: Black, Sold By: Rosemary Brown, Some Other Column: Other Data ',
'Colours: Black, Sold By: Rosemary Brown, Some Other Column: Other Data '
) %>%
mutate(
size = str_trim(str_extract(col1, "(?<=Size:)[^,]*(?= -)")),
colours = str_trim(str_extract(col1, "(?<=Colours:)[^,]*(?=,)")),
sold_by = str_trim(str_extract(col1, "(?<=Sold By:)[^,]*(?=,)"))
) %>%
select(-col1)
输出:
# A tibble: 3 × 3
size colours sold_by
<chr> <chr> <chr>
1 AS Black Elmwood
2 YS Black Rosemary Brown
3 NA Black Rosemary Brown