我正在尝试匹配段落中的句子并替换它们。
下面是数据帧 -
fulltext = c(rep("<span style="font-family:Calibri"><span style="font-size:18px">__ - Now</span>rnrn<strong><span style="font-size:24px">X - Soon</span></strong>rnrn<span style="font-size:18px">__ - N</span></span><span style="font-family:Calibri"><span style="font-size:18px">ext Scheduled Maintenance or Inspection</span></span>",3),
"<span style="font-size:20px"><strong><span style="font-family:"Calibri",sans-serif">What is Triggering this Expert Alert?</span></strong></span>")
cleantext = c("__ - Now", "X - Soon", "ext Scheduled Maintenance or Inspection", "What is Triggering this Expert Alert?")
replacetext = c("__ - Nu", "X - Binnenkort", "ext Gepland onderhoud of inspectie", "Wat veroorzaakt deze expertwaarschuwing?")
data5 = data.frame(fulltext, cleantext, replacetext)
这就是我想做的——
- 从干净的文本中取出句子
- 与全文匹配
- 将"清除文本"替换为全文中的"替换文本">
例如。
以上是完整的段落,我想用Wat veroorzaakt deze expertwaarschuwing替换粗体句子?
输出应如下所示 -
这就是我到目前为止尝试过的。现在我已经尝试了几种方法。
- 使用字符串替换
- 尝试在句子的开头和结尾添加 ^ 和 $,然后使用 gsub 将其匹配为正则表达式模式。但我认为这只适用于文字。以下是我的尝试,但没有奏效。
data5$cleantext2 = paste0("^",data5$cleantext,"$")
gsub(data1$Cleantext2[1], data1$replacetext[1], data1$fulltext[1])
不需要循环。此外,您的^
和$
将无法正常工作,因为您的替换模式是中弦。您可以使用固定模式来减少不匹配。
由于您想将所有模式/替换应用于fulltext
中的每一个(而不仅仅是 1 对 1(,那么我认为您可以Reduce
它。
Reduce(function(s, ptn) gsub(ptn[1], ptn[2], s, fixed = TRUE),
Map(c, cleantext, replacetext),
init = fulltext)
# [1] "<span style="font-family:Calibri"><span style="font-size:18px">__ - Nu</span>rnrn<strong><span style="font-size:24px">X - Binnenkort</span></strong>rnrn<span style="font-size:18px">__ - N</span></span><span style="font-family:Calibri"><span style="font-size:18px">ext Gepland onderhoud of inspectie</span></span>"
# [2] "<span style="font-family:Calibri"><span style="font-size:18px">__ - Nu</span>rnrn<strong><span style="font-size:24px">X - Binnenkort</span></strong>rnrn<span style="font-size:18px">__ - N</span></span><span style="font-family:Calibri"><span style="font-size:18px">ext Gepland onderhoud of inspectie</span></span>"
# [3] "<span style="font-family:Calibri"><span style="font-size:18px">__ - Nu</span>rnrn<strong><span style="font-size:24px">X - Binnenkort</span></strong>rnrn<span style="font-size:18px">__ - N</span></span><span style="font-family:Calibri"><span style="font-size:18px">ext Gepland onderhoud of inspectie</span></span>"
# [4] "<span style="font-size:20px"><strong><span style="font-family:"Calibri",sans-serif">Wat veroorzaakt deze expertwaarschuwing?</span></strong></span>"