我有一个向量tags
,它带有问题标签,会话中的tst
话语以问题标签结束:
tags <- c("are you", "are they", "aren't they", "aren't you", "can I",
"can't ya", "can't you", "could he", "could she", "could you",
"could they", "didn't it", "didn't you", "didn't we", "didn't she",
"didn't they", "did he", "did she", "did you", "do I", "do we",
"do you", "do they", "do you know what I mean", "you know what I mean",
"does it", "does he", "does she", "doesn't he", "doesn't she",
"doesn't it", "dunnit", "don't ya", "don't you", "don't they",
"has he", "has it", "hasn't he", "hasn't she", "have I", "have you",
"have they", "haven't they", "haven't you", "haven't we", "huh",
"innit", "is it", "is he", "is she", "is there", "isn't he",
"isn't it", "isn't it sweetheart", "isn't she", "isn't there",
"might'n we", "should you", "shouldn't you", "was it", "wasn't she",
"wasn't he", "was she", "was he", "wasn't it", "weren't they",
"will he", "will she", "will it", "will there", "will they",
"would he", "would she", "would ya", "would you", "wouldn't you",
"wouldn't it", "wouldn't they", "wouldn't she", "wouldn't he",
"wouldn't you", "won't it", "won't you", "won't they", "won't he",
"won't she", "won't we", "you know", "you think", "ain't they",
"don't we", "did i")
tst <- c("It's nice that length isn't it?", # 4 words prior to question tag
"that wee boy sleepwalks, doesn't he?", # 4 words
"well you know?", # 1 word
"Sandy Row's isn't it?", # 2 words <-- should match
"Good this week, innit?", # 3 words <-- should match
"in front of witnesses, don't you") # 4 words
我需要匹配那些问号前面正好有2-3个单词的话语。我定义了这个模式:
patt_tag <- paste0(".*(?:\S+[\s,.!?]){2,3}\b(", paste0(tags, collapse = "|"), ")\b(\.|\?|!|,)?$")
但它与不应该匹配的话语相匹配:
tst[grepl(patt_tag, tst, perl = T)]
[1] "It's nice that length isn't it?" "that wee boy sleepwalks, doesn't he?" "Sandy Row's isn't it?"
[4] "Good this week, innit?" "in front of witnesses, don't you"
我正在寻找的结果是:
"Sandy Row's isn't it?" "Good this week, innit?"
有人能帮忙吗?
将patt_tag <- paste0(".*(?
中的.*
更改为^
->patt_tag <- paste0("^(?
。。。
patt_tag <- paste0("^(?:\S+[\s,.!?]){2,3}\b(", paste0(tags, collapse = "|"), ")\b(\.|\?|!|,)?$")
tst[grepl(patt_tag, tst, perl = T)]
#[1] "Sandy Row's isn't it?" "Good this week, innit?"