基于列-R中的字符串检测提取报告

我正试图通过检测注释列中的特定单词来提取每个产品的数据使用字符串检测

Product <- c("a","a","a","a","a","a","b","b","b","b","c","c","c")
Comments <-c("The product enrolled"," product created","The product reviewed"," probable sale","probable sale","failed"
,"The product enrolled"," probable","The product failed"," product failed"
,"The product enrolled"," probable","The product failed")

sales<- data.frame(Product,Comments)

我正试图提取所有产品的报告；可能的"；使用str_detect将注释中的单词作为数据帧并且在可能作为不同的数据帧之后

预期输出

数据帧1：在可能的之前

Product             Comments
a            The product enrolled
a                product created
a              The product reviewed
b              The product enrolled
c             The product enrolled

数据帧2：可能的

a             probable sale
a              probable sale
b             probable
c             probable

数据帧3：在可能的之后

b   The product failed
c   The product failed

使用dplyr(和grepl，因为它在这里工作得很好(：

sales$isprobable <- grepl("probable", sales$Comments)
library(dplyr)
sales %>%
group_by(Product) %>%
filter(!cumany(isprobable)) %>%
ungroup()
# # A tibble: 5 x 3
#   Product Comments               isprobable
#   <chr>   <chr>                  <lgl>     
# 1 a       "The product enrolled" FALSE     
# 2 a       " product created"     FALSE     
# 3 a       "The product reviewed" FALSE     
# 4 b       "The product enrolled" FALSE     
# 5 c       "The product enrolled" FALSE    

sales %>%
filter(isprobable)
#   Product       Comments isprobable
# 1       a  probable sale       TRUE
# 2       a  probable sale       TRUE
# 3       b       probable       TRUE
# 4       c       probable       TRUE
sales %>%
group_by(Product) %>%
filter(!isprobable & lag(isprobable)) %>%
ungroup()
# # A tibble: 3 x 3
#   Product Comments           isprobable
#   <chr>   <chr>              <lgl>     
# 1 a       failed             FALSE     
# 2 b       The product failed FALSE     
# 3 c       The product failed FALSE

相关内容

最新更新

热门标签：