R -根据文本模式分割列



我想在R中拆分一个列,其中包含服装产品的Size, Color和customer的描述。

列看起来像这样

尺寸:AS -成人小,颜色:黑色,由:Elmwood销售尺寸:YS -青年小,颜色:黑色,销售:迷迭香棕色PS颜色-黑色,售出By: Rosemary Brown PS

所以,有些列有大小,有些以color

开头我需要做三列- 1。大小2。颜色3。销售的至于大小,我只需要两个字母大小的代码,颜色-需要颜色描述For Sold By:我需要解析它,直到它找到任何"逗号">

谢谢你的帮助

对于您给出的特定示例,这应该可以达到目的:

library(tidyverse)
df <- tribble(
~col1, 
'Size: AS - Adult Small, Colours: Black , Sold By: Elmwood , Some Other Column: Other Data ',   
'Size:YS - Youth Small, Colours: Black, Sold By: Rosemary Brown, Some Other Column: Other Data ',   
'Colours: Black, Sold By: Rosemary Brown, Some Other Column: Other Data '
) %>% 
mutate(
size = str_trim(str_extract(col1, "(?<=Size:)[^,]*(?= -)")),
colours = str_trim(str_extract(col1, "(?<=Colours:)[^,]*(?=,)")),
sold_by = str_trim(str_extract(col1, "(?<=Sold By:)[^,]*(?=,)"))
) %>% 
select(-col1)

输出:

# A tibble: 3 × 3
size  colours sold_by       
<chr> <chr>   <chr>         
1 AS    Black   Elmwood       
2 YS    Black   Rosemary Brown
3 NA    Black   Rosemary Brown

最新更新