r代码在段落中搜索一个单词,并在关键词的句子之后复制之前



在rstudio中,我遵循R代码的方法在段落中搜索单词并在变量中复制句子

确定我需要的包含关键词的句子(例如下面的授粉)。

但是,我想在此句子之后提取一个句子和一个句子,其中包含我需要的关键词。

所需输入的输出以下:它们的范围比蜜蜂北得多,并且在加拿大北部的埃尔斯米尔岛可以找到殖民地,距北极仅880公里!随着最近在温室授粉中使用大黄蜂的普及,它们可能会在世界大部分地区都可以找到(见下文),尤其是Bombus Terrestris,这似乎是为此目的出售的最受欢迎的物种。最近,有人提出了将大黄蜂引入澳大利亚的,以在温室中授粉的农作物进行授粉。

如果有很多单词授粉的发生,我如何通过循环函数获得此功能。

这是我到目前为止的R代码:

text <- "Bumblebees are found mainly in northern temperate regions, thoughthere are a few native South American species and New Zealand has some naturalised species that were introduced around 100 years ago to pollinate red clover. They range much further north than honey bees, and colonies can be found on Ellesmere Island in northern Canada, only 880 km from the north pole!
With the recent popularity of using bumblebees in glasshouse pollination they will probably be found in most parts of the world before long (see below), especially Bombus terrestris which seems to be the most popular species sold for this purpose. Recently there have been proposals to introduce bumblebees into Australia to pollinate crops in glasshouses. Now, though I dearly love bumblebees, I do think that this might not be a very good idea. No matter what security measures are taken, mated queens WILL escape eventually and that will probably lead to their establishment in the wild.And yet another non-native invasion of a country that has suffered more than most from such things. This invasion may or may not be benign, but isn't it better to err on the side of caution? Apparently there are already colonies of Bombus terrestris on Tasmania, so I suppose it is now only a matter of time before they reach the mainland."

#end
library(qdap)
sent_detect(text)
##There are NINE sentences in text 
##Output
[1] "Bumblebees are found mainly in northern temperate regions, though there are a few native South American species and New Zealand has some naturalised species that were introduced around 100 years ago to pollinate red clover."            
[2] "They range much further north than honey bees, and colonies can be found on Ellesmere Island in northern Canada, only 880 km from the north pole!"                                                                                          
[3] "With the recent popularity of using bumblebees in glasshouse pollination they will probably be found in most parts of the world before long, especially Bombus terrestris which seems to be the most popular species sold for this purpose."
[4] "Recently there have been proposals to introduce bumblebees into Australia to pollinate crops in glasshouses."                                                                                                                               
[5] "Now, though I dearly love bumblebees, I do think that this might not be a very good idea."                                                                                                                                                  
[6] "No matter what security measures are taken, mated queens WILL escape eventually and that will probably lead to their establishment in the wild."                                                                                            
[7] "And yet another non-native invasion of a country that has suffered more than most from such things."                                                                                                                                        
[8] "This invasion may or may not be benign, but isn't it better to err on the side of caution?"                                                                                                                                                 
[9] "Apparently there are already colonies of Bombus terrestris on Tasmania, so I suppose it is now only a matter of time before they reach the mainland."
#End

使用Quanteda软件包,我确认有九个句子,然后以文本为单位:

library(quanteda)
nsentence(text)
# [1] 9
##Searching for word pollination - it finds the first occurrence only
dat <- data.frame(text=sent_detect(text), stringsAsFactors = FALSE)
Search(dat, "pollination")
[1] "With the recent popularity of using bumblebees in glasshouse  pollination they will probably be found in most parts of the world before long, especially Bombus terrestris which seems to be the most popular species sold for this purpose."

#End

您可以使用基本R模式匹配函数:

d <- sent_detect(text)
# grep the sentense with the keyword:
n <- which(grepl('pollination', d) == T)
# 3
# get context of +-1
d[(n - 1):(n + 1)]
# [1] "They range much further north than honey bees, and colonies can be found on Ellesmere Island in northern Canada, only 880 km from the north pole!"
# [2] "With the recent popularity of using bumblebees in glasshouse pollination they will probably be found in most parts of the world before long, especially Bombus terrestris which seems to be the most popular species sold for this purpose."
# [3] "Recently there have been proposals to introduce bumblebees into Australia to pollinate crops in glasshouses."
# nice output:
cat(d[(n - 1):(n + 1)])
# if there are multiple sentences with the keyword:
lapply(which(grepl('pollination', d) == T), function(n){
    cat(d[(n - 1):(n + 1)])
})

这是一个相当直接的方式o o:

dat[c(inds <- grep("[Pp]ollination", dat[[1]]) + 1, inds - 2),]
## [1] "Recently there have been proposals to introduce bumblebees into Australia to pollinate crops in glasshouses."                                     
## [2] "They range much further north than honey bees, and colonies can be found on E

莱尔斯米尔岛(Llesmere Island)位于加拿大北部,距北极880公里!"

最新更新