如何将单元格中包含的句子拆分为R中的不同行



我试了好几次,都不起作用。如何将单元格中包含的句子拆分为不同的行,以保持其余值?

示例:数据帧df有20列。第j行第i列包含一些用"|"分隔的注释"我想要一个新的数据帧df2,它根据句子的数量增加行的数量。这意味着,如果单元格j,i有句子A |句子B

第j行第i列有句子A第j+1行,第i列有句子B列1至i-1和i+1至20在第j行和第j+1行中具有相同的值。

我不知道这是否有一个简单的解决方案。

非常感谢。

我们可以从splitstackshape使用cSplit

library(splitstackshape)
cSplit(df, 'col3', sep="\|", "long", fixed = FALSE)
#   col1 col2                col3
#1:    a    1                fitz
#2:    a    1                buzz
#3:    b    2                 foo
#4:    b    2                 bar
#5:    c    3         hello world
#6:    c    3   today is Thursday
#7:    c    3           its 2:00
#8:    d    4                fitz

数据

df <- structure(list(col1 = c("a", "b", "c", "d"), col2 = c(1, 2, 3, 
4), col3 = c("fitz|buzz", "foo|bar", "hello world|today is Thursday | its 2:00", 
"fitz")), class = "data.frame", row.names = c(NA, -4L))

以下是一个使用3个tidyverse包的解决方案,该包的评论的最大数量未知

library(dplyr)
library(tidyr)
library(stringr)
# Create function to calculate the max number comments per observation within 
# df$col3 and create a string of unique "names"
cols <- function(x) {
cmts <- str_count(x, "([|])")
max_cmts <- max(cmts, na.rm = TRUE) + 1
features <- c(sprintf("V%02d", seq(1, max_cmts)))
}
# Create the data
df1 <- data.frame(col1 = c("a", "b", "c", "d"),
col2 = c(1, 2, 3, 4),
col3 = c("fitz|buzz", NA, 
"hello world|today is Thursday | its 2:00|another comment|and yet another comment", "fitz"),
stringsAsFactors = FALSE)
# Generate the desired output
df2 <- separate(df1, col3, into = cols(x = df1$col3), 
sep = "([|])", extra = "merge", fill = "right") %>% 
pivot_longer(cols = cols(x = df1$col3), values_to = "comments", 
values_drop_na = TRUE) %>% 
select(-name)

导致

df2
# A tibble: 8 x 3
col1   col2 comments                 
<chr> <dbl> <chr>                    
1 a         1 "fitz"                   
2 a         1 "buzz"                   
3 c         3 "hello world"            
4 c         3 "today is Thursday "     
5 c         3 " its 2:00"              
6 c         3 "another comment"        
7 c         3 "and yet another comment"
8 d         4 "fitz" 

相关内容

  • 没有找到相关文章

最新更新