r语言 - 如何将一个观测值分解成几个子观测值? - r - How to break down one observation into several sub-observations? 小贝子编程网

我的数据框架包含几篇收集的文章，df$title表示标题，df$text表示每篇文章的内容。我需要把每篇文章分成几个段落。以下是我对一篇文章的分析:

pattern = "\bM(?:rs?|s)\.\s"
aa <- str_replace_all( text1, pattern, "XXXX")
bb <- unlist(strsplit(aa, "XXXX"))
cc <- bb[-1]
dd <- gsub("[\]", " ", cc)
paragraph vector <- gsub("[^[:alnum:]]", " ", dd)

我如何用文章标题标记每个段落，并将分解工作应用到整个列(df$text)?我希望每一段都成为一个观察(而不是一篇文章作为观察)。

这是一个简单的例子，每个段落用两个空行分隔:

library(tidyverse)
data <- tibble(
title = c("The Book of words", "A poem"),
text = c("It was a dark and stormy night. nn And this is another paragraph.", "ThisnnisnnthennEnd")
)
cat(data$text[[1]])
#> It was a dark and stormy night. 
#> 
#>  And this is another paragraph.
cat(data$text[[2]])
#> This
#> 
#> is
#> 
#> the
#> 
#> End
data %>%
transmute(
title,
paragraph = text %>% map(~ {
.x %>%
str_split("nn") %>%
simplify() %>%
map_chr(str_trim)
})
) %>%
unnest(paragraph)
#> # A tibble: 6 × 2
#>   title             paragraph                      
#>   <chr>             <chr>                          
#> 1 The Book of words It was a dark and stormy night.
#> 2 The Book of words And this is another paragraph. 
#> 3 A poem            This                           
#> 4 A poem            is                             
#> 5 A poem            the                            
#> 6 A poem            End

^{由reprex包(v2.0.1)于2021-09-26创建}

r语言 - 如何将一个观测值分解成几个子观测值?

相关内容

最新更新

热门标签：