从R中带有POS标记的文本中提取除POS标记之外的动词

  • 本文关键字:POS 文本 提取 r
  • 更新时间 :
  • 英文 :


我是R的新手。我试图用";openNLP";包(请注意,"udpipe"在我的环境中不起作用(。我有一个句子和下面的标签混在一起。

"执行/VBG工作/NN为/IN始终/RB./。踢/VBG足球/NN是/VBZ好/JJ./。I/PRP do/VBP that/IN";

如何在没有POS标签的情况下实现动词?在这个例子中,我试图得到的答案是

"做"播放"是"做";

您请求的示例:

x <- "Doing/VBG work/NN as/IN always/RB ./. playing/VBG soccer/NN is/VBZ good/JJ ./. I/PRP do/VBP that/IN"
x <- strsplit(x, split = " ")
x <- unlist(x)
x <- lapply(x, FUN = function(data){ 
x <- strsplit(data, split = "\/")
x <- unlist(x)
data.frame(token = x[1], xpos = x[2], stringsAsFactors = FALSE)
})
x <- do.call(rbind, x)
subset(x, xpos %in% c("VB","VBD","VBG","VBN","VBP","VBZ"))

使用udpipe

library(udpipe)
txt <- c(doc1 = "Doing work as always. playing soccer is good. I do that")
x <- udpipe(txt, object = "english", udpipe_model_repo = "bnosac/udpipe.models.ud", trace = 100)
subset(x, xpos %in% c("VB","VBD","VBG","VBN","VBP","VBZ"))
> subset(x, xpos %in% c("VB","VBD","VBG","VBN","VBP","VBZ"))
doc_id paragraph_id sentence_id                sentence start end term_id token_id   token lemma upos xpos
1    doc1            1           1   Doing work as always.     1   5       1        1   Doing    do VERB  VBG
6    doc1            1           2 playing soccer is good.    23  29       6        1 playing  play VERB  VBG
8    doc1            1           2 playing soccer is good.    38  39       8        3      is    be  AUX  VBZ
12   doc1            1           3               I do that    49  50      12        2      do    do VERB  VBP
feats head_token_id dep_rel deps misc
1                                           VerbForm=Ger             0    root <NA> <NA>
6                                           VerbForm=Ger             4   csubj <NA> <NA>
8  Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin             4     cop <NA> <NA>
12                      Mood=Ind|Tense=Pres|VerbForm=Fin             0    root <NA> <NA>

相关内容

最新更新