替换R中段落中的确切句子



我正在尝试匹配段落中的句子并替换它们。

下面是数据帧 -

fulltext = c(rep("<span style="font-family:Calibri"><span style="font-size:18px">__ - Now</span>rnrn<strong><span style="font-size:24px">X - Soon</span></strong>rnrn<span style="font-size:18px">__ - N</span></span><span style="font-family:Calibri"><span style="font-size:18px">ext Scheduled Maintenance or Inspection</span></span>",3),
"<span style="font-size:20px"><strong><span style="font-family:&quot;Calibri&quot;,sans-serif">What is Triggering this Expert Alert?</span></strong></span>")
cleantext = c("__ - Now", "X - Soon", "ext Scheduled Maintenance or Inspection", "What is Triggering this Expert Alert?")
replacetext = c("__ - Nu", "X - Binnenkort", "ext Gepland onderhoud of inspectie", "Wat veroorzaakt deze expertwaarschuwing?")
data5 = data.frame(fulltext, cleantext, replacetext)

这就是我想做的——

  1. 从干净的文本中取出句子
  2. 与全文匹配
  3. 将"清除文本"替换为全文中的"替换文本">

例如。什么是触发此 daert 警报?

以上是完整的段落,我想用Wat veroorzaakt deze expertwaarschuwing替换粗体句子?

输出应如下所示 -Wat veroorzaakt deze expertwaarschuwing?

这就是我到目前为止尝试过的。现在我已经尝试了几种方法。

  1. 使用字符串替换
  2. 尝试在句子的开头和结尾添加 ^ 和 $,然后使用 gsub 将其匹配为正则表达式模式。但我认为这只适用于文字。以下是我的尝试,但没有奏效。

data5$cleantext2 = paste0("^",data5$cleantext,"$") gsub(data1$Cleantext2[1], data1$replacetext[1], data1$fulltext[1])

不需要循环。此外,您的^$将无法正常工作,因为您的替换模式是中弦。您可以使用固定模式来减少不匹配。

由于您想将所有模式/替换应用于fulltext中的每一个(而不仅仅是 1 对 1(,那么我认为您可以Reduce它。

Reduce(function(s, ptn) gsub(ptn[1], ptn[2], s, fixed = TRUE), 
Map(c, cleantext, replacetext),
init = fulltext)
# [1] "<span style="font-family:Calibri"><span style="font-size:18px">__ - Nu</span>rnrn<strong><span style="font-size:24px">X - Binnenkort</span></strong>rnrn<span style="font-size:18px">__ - N</span></span><span style="font-family:Calibri"><span style="font-size:18px">ext Gepland onderhoud of inspectie</span></span>"
# [2] "<span style="font-family:Calibri"><span style="font-size:18px">__ - Nu</span>rnrn<strong><span style="font-size:24px">X - Binnenkort</span></strong>rnrn<span style="font-size:18px">__ - N</span></span><span style="font-family:Calibri"><span style="font-size:18px">ext Gepland onderhoud of inspectie</span></span>"
# [3] "<span style="font-family:Calibri"><span style="font-size:18px">__ - Nu</span>rnrn<strong><span style="font-size:24px">X - Binnenkort</span></strong>rnrn<span style="font-size:18px">__ - N</span></span><span style="font-family:Calibri"><span style="font-size:18px">ext Gepland onderhoud of inspectie</span></span>"
# [4] "<span style="font-size:20px"><strong><span style="font-family:&quot;Calibri&quot;,sans-serif">Wat veroorzaakt deze expertwaarschuwing?</span></strong></span>"                                                                                                                                                                           

最新更新