str拆分为多行,并根据r中的分隔符从另一列复制字符串



我也接受pandas解决方案,我的公司不喜欢使用r.

我得到了一个数据集的噩梦,需要一些帮助,使用tidyr 在r中转换它

示例df记录:

id  date              people                                     things
12  12/12/12    last, first [id124] last, first middle [id1782] thing 1nthing 2nthing 3n    thing 4nthing 5 

我需要根据他们的ID对他们进行拆分,然后拆分东西并将其与人匹配。事物按顺序在人与人之间被分隔开;\n〃;。

所需的最终结果:

id  date          people                    things
12  12/12/12    last, first [id124]         thing 1
12  12/12/12    last, first [id124]         thing 2
12  12/12/12    last, first [id124]         thing 3
12  12/12/12    last, first middle [id1782] thing 4
12  12/12/12    last, first middle [id1782] thing 5

我无法做出足够好的尝试,甚至无法在这里分享。

我们可以使用双cSplit,即首先在]处拆分,然后是空格或(|(换行符(n(,其中包含超过1个空格(\s{2,}(。在返回的"long"格式中,在换行符的"things"列上进行第二次拆分,如果需要,在"people"中恢复在拆分中删除的](regex lookaround似乎不适用于cSplit(

library(splitstackshape)
library(dplyr)
library(stringr)
cSplit(df1, c("people", "things"), sep='\] |n\s{2,}', 'long', 
fixed = FALSE) %>% 
cSplit("things", sep="n", "long") %>%
mutate(people = str_replace(people, "(\d+)$", "\1]"))

-输出

#    id     date                      people  things
#1: 12 12/12/12         last, first [id124] thing 1
#2: 12 12/12/12         last, first [id124] thing 2
#3: 12 12/12/12         last, first [id124] thing 3
#4: 12 12/12/12 last, first middle [id1782] thing 4
#5: 12 12/12/12 last, first middle [id1782] thing 5

数据

df1 <- structure(list(id = 12L, date = "12/12/12", people = "last, first [id124] last, first middle [id1782]", 
things = "thing 1nthing 2nthing 3n    thing 4nthing 5"),
row.names = c(NA, 
-1L), class = "data.frame")

相关内容

  • 没有找到相关文章