基于列-R中的字符串检测提取报告

  • 本文关键字:提取 报告 字符串 于列 r
  • 更新时间 :
  • 英文 :


我正试图通过检测注释列中的特定单词来提取每个产品的数据使用字符串检测

Product <- c("a","a","a","a","a","a","b","b","b","b","c","c","c")
Comments <-c("The product enrolled"," product created","The product reviewed"," probable sale","probable sale","failed"
,"The product enrolled"," probable","The product failed"," product failed"
,"The product enrolled"," probable","The product failed")

sales<- data.frame(Product,Comments)  

我正试图提取所有产品的报告;可能的";使用str_detect将注释中的单词作为数据帧并且在可能作为不同的数据帧之后

预期输出

数据帧1:在可能的之前

Product             Comments
a            The product enrolled
a                product created
a              The product reviewed
b              The product enrolled
c             The product enrolled

数据帧2:可能的

a             probable sale
a              probable sale
b             probable
c             probable

数据帧3:在可能的之后

b   The product failed
c   The product failed

使用dplyr(和grepl,因为它在这里工作得很好(:

sales$isprobable <- grepl("probable", sales$Comments)
library(dplyr)
sales %>%
group_by(Product) %>%
filter(!cumany(isprobable)) %>%
ungroup()
# # A tibble: 5 x 3
#   Product Comments               isprobable
#   <chr>   <chr>                  <lgl>     
# 1 a       "The product enrolled" FALSE     
# 2 a       " product created"     FALSE     
# 3 a       "The product reviewed" FALSE     
# 4 b       "The product enrolled" FALSE     
# 5 c       "The product enrolled" FALSE    

sales %>%
filter(isprobable)
#   Product       Comments isprobable
# 1       a  probable sale       TRUE
# 2       a  probable sale       TRUE
# 3       b       probable       TRUE
# 4       c       probable       TRUE
sales %>%
group_by(Product) %>%
filter(!isprobable & lag(isprobable)) %>%
ungroup()
# # A tibble: 3 x 3
#   Product Comments           isprobable
#   <chr>   <chr>              <lgl>     
# 1 a       failed             FALSE     
# 2 b       The product failed FALSE     
# 3 c       The product failed FALSE     

最新更新