我试了好几次,都不起作用。如何将单元格中包含的句子拆分为不同的行,以保持其余值?
示例:数据帧df有20列。第j行第i列包含一些用"|"分隔的注释"我想要一个新的数据帧df2,它根据句子的数量增加行的数量。这意味着,如果单元格j,i有句子A |句子B
第j行第i列有句子A第j+1行,第i列有句子B列1至i-1和i+1至20在第j行和第j+1行中具有相同的值。
我不知道这是否有一个简单的解决方案。
非常感谢。
我们可以从splitstackshape
使用cSplit
library(splitstackshape)
cSplit(df, 'col3', sep="\|", "long", fixed = FALSE)
# col1 col2 col3
#1: a 1 fitz
#2: a 1 buzz
#3: b 2 foo
#4: b 2 bar
#5: c 3 hello world
#6: c 3 today is Thursday
#7: c 3 its 2:00
#8: d 4 fitz
数据
df <- structure(list(col1 = c("a", "b", "c", "d"), col2 = c(1, 2, 3,
4), col3 = c("fitz|buzz", "foo|bar", "hello world|today is Thursday | its 2:00",
"fitz")), class = "data.frame", row.names = c(NA, -4L))
以下是一个使用3个tidyverse包的解决方案,该包的评论的最大数量未知
library(dplyr)
library(tidyr)
library(stringr)
# Create function to calculate the max number comments per observation within
# df$col3 and create a string of unique "names"
cols <- function(x) {
cmts <- str_count(x, "([|])")
max_cmts <- max(cmts, na.rm = TRUE) + 1
features <- c(sprintf("V%02d", seq(1, max_cmts)))
}
# Create the data
df1 <- data.frame(col1 = c("a", "b", "c", "d"),
col2 = c(1, 2, 3, 4),
col3 = c("fitz|buzz", NA,
"hello world|today is Thursday | its 2:00|another comment|and yet another comment", "fitz"),
stringsAsFactors = FALSE)
# Generate the desired output
df2 <- separate(df1, col3, into = cols(x = df1$col3),
sep = "([|])", extra = "merge", fill = "right") %>%
pivot_longer(cols = cols(x = df1$col3), values_to = "comments",
values_drop_na = TRUE) %>%
select(-name)
导致
df2
# A tibble: 8 x 3
col1 col2 comments
<chr> <dbl> <chr>
1 a 1 "fitz"
2 a 1 "buzz"
3 c 3 "hello world"
4 c 3 "today is Thursday "
5 c 3 " its 2:00"
6 c 3 "another comment"
7 c 3 "and yet another comment"
8 d 4 "fitz"