替换预定义列表中的整个单词



我目前的编码将审查设备描述字段,其目的是标准化条目。 也就是说 - 无论在 COL A 中找到什么,都替换为 COL B 我想将答案发布回一个新的干净描述列(这将正常工作,该部分没有戏剧,但我不需要任何消息等,这可能一次做 100,000+ 描述,所以寻找有效的编码(。

但是,当它应用替换功能时,无论我在"词典"选项卡上如何对单词进行排序,它都会替换部分单词,而不是不同的整个单词。 ** 一百次中有 99 次在 Col A 条目中没有前置或尾随空格,但也有极少数情况......

描述示例:

AIR COMPRESSOR
LEVEL GAUGE OIL SEPARATOR GAS COMPRESSOR
PRESS CTRL VV
PRESSURE GAUGE FLAME FRONT

PRESS作为单词的一部分变得PRESSURE,例如:

COL A:              COL B:                    
COMPRESSSOR              COMPRESSOR                
PRESSURE                 PRESSURE                  
PRESSURE   GAUGE         PRESSURE GAUGE
PRESS                    PRESSURE
AIR COMPRESSOR           AIR COMPRESSOR

我想我非常接近做到这一点,但我无法弄清楚如何调整以使其运行并仅替换整个单词 - 我认为这是我拥有东西的顺序,但不是 100% 确定,或者是否缺少某些东西。
我将非常感谢您对此的帮助。

谢谢,温迪

Function CleanUntil(original As String, targetReduction As Integer)
Dim newString As String
newString = original
Dim targetLength As Integer
targetLength = Len(original) - targetReduction
Dim rowCounter As Integer
rowCounter = 2
Dim CleanSheet As Worksheet
Set CleanSheet = ActiveWorkbook.Sheets("Dictionary")
Dim word As String
Dim cleanword As String
' Coding for replacement of WHOLE words - with a regular expression using a pattern with the b marker (for the word boundary) before and after word
Dim RgExp As Object
Set re = CreateObject("VBScript.RegExp")
With RgExp
.Global = True
'.IgnoreCase = True     'True if search is case insensitive.  False otherwise
End With
'Loop through each word until we reach the target length (or other value noted), or run out of clean words to apply
'While Len(newString) > 1     (this line will do ALL descriptions - confirmed)
'While Len(newString) > targetLength   (this line will only do to target length)
While Len(newString) > 1
word = CleanSheet.Cells(rowCounter, 1).Value
cleanword = CleanSheet.Cells(rowCounter, 2).Value
RgExp.Pattern = "b" & word & "b"
If (word = "") Then
CleanUntil = newString
Exit Function
End If
' TODO: Make sure it is replacing whole words and not just portions of words
'    newString = Replace(newString, word, cleanword)     ' This line works if no RgExp applied, but finds part words.
newString = RgExp.Replace(newString, word, cleanword)
rowCounter = rowCounter + 1
Wend
' Once word find/replace finished, set close out loop for RgExp Object with word boundaries.
Set RgExp = Nothing
' Finally return the cleaned string as clean as we could get it, based on dictionary
CleanUntil = newString
End Function

注意:我强烈建议添加对Microsoft VBScript 正则表达式 5.5库的引用(通过工具 -> 引用...(。这将在RegExp对象上提供强类型和智能感知。

Dim RgExp As New RegExp

如果我理解正确,你可以找到需要使用正则表达式替换的条目;正则表达式只匹配A中的值是完整单词的条目。

但是,当您尝试用 VBAReplace函数替换时,它甚至会替换文本中的部分单词。使用RegExp.Replace方法没有任何效果 -- 字符串始终保持不变。

这是 VBA 中使用的正则表达式引擎的一个怪癖。您无法替换完全匹配项;您只能使用()替换组中捕获的内容。

RgExp.Pattern = "b(" & word & ")b"
' ...
newString = RgExp.Replace(newString, cleanword)

如果要从边界字符中排除连字符,则可以使用否定模式来排除任何单词字符或连字符:

RgExp.Pattern = "[^w-](" & word & ")[^w-]"

参考:

  • 替换方法
  • VBScript 正则表达式库简介

最新更新