我正在思考如何让函数使用前面的线索词来选择最佳候选者。这应该在字符串集中出现,我尝试了很多次,但都没能做到
基本概念是,有一个像这样的字符串"-----clueword(candiate1|candidate2…)----",我想要的函数可以根据数据选择最有希望的候选者。
clue = c( 'a', 'to', 'a', 'to', 'to')
word = c('house','school','paper','water','schooling')
cooccur = c(100, 90, 83, 70, 61)
data = data.frame(clue,word,cooccur)
假设有两个字符串集
S1 = 'I have a (house|water|paper) and car'
S2 = 'I need to go to (school|schooling) right now'
线索词"a"与"house"同时出现的频率很高,而"to"与"school"同时出现。因此,使用THE函数,结果应该是
S1
[1] 'I have a (house) and car'
S2
[2] 'I need to go to (school) right now'
您不需要担心删除不太有希望的候选者,因为这段代码可以处理这一问题。
library(gsubfn)
gsubfn("\(([^)]+)", ~paste0("(", paste(THEFUNCTION(unlist(x)), collapse="|")), S1)
我知道我可以使用which.max()
,但使用它与"线索"相关并不容易。有什么办法让我度过难关吗?
这是有效的:
THEFUNCTION <- function(x) { # dummy function, to be replaced by the one that selects w.r.t. co-occurence frequency
# this function receives inputs without paranthesis: e.g., 'house|water|paper'
ifelse(grepl('house', x), 'house', 'school')
}
S1 = 'I have a (house|water|paper) and car'
S2 = 'I need to go to (school|schooling) right now'
library(gsubfn)
gsubfn("\(([^\)]+)\)", ~paste0("(", paste(THEFUNCTION(unlist(x)), collapse="|"), ")"), S1)
#[1] "I have a (house) and car"
gsubfn("\(([^\)]+)\)", ~paste0("(", paste(THEFUNCTION(unlist(x)), collapse="|"), ")"), S2)
#[1] "I need to go to (school) right now"