将"bad words"部分替换为 asteriks,忽略大小写并保留旧大小写



我有一个不好的词替换脚本 VB.net 这导致了许多问题。经过多次试验和错误,当前代码可以工作,但不会过滤掉有大写字母的单词。

Private Function CheckForBadWords(ByVal InputString As String) As String
Dim r As Regex
Dim element As String
Dim eLength As Integer
Dim x As Integer
Dim AttachtoEnd As String
For Each element In alWordList
r = New Regex("b" & element)
eLength = element.Length
For x = 3 To eLength - 1
AttachtoEnd = AttachtoEnd & "*"
Next
InputString = r.Replace(InputString, element, Left(element, 3) & AttachtoEnd)
AttachtoEnd = ""
Next
Return InputString
End Function

如何让它检查带有大写字母的单词?例如:phuck 将得到检查,因为 Phuck 或 PHUCK 不会被检查。

我尝试按照本教程进行操作,但它是 C# 的,我几乎不知道 VB.net: http://www.dreamincode.net/forums/topic/67129-creating-a-bad-word-filter-functionality-in-aspnet-wc%23/

添加更多细节:在一些帮助下,这似乎在多次调整后有效,但错误仍然存在,特别是引号和双引号或

Private Function CheckForBadWords(ByVal InputString As String) As String
Dim starPosition As Integer = 0
Dim element As String
Dim eLength As Integer
Dim x As Integer
Dim AttachtoEnd As String
Dim strArray = InputString.Split(" ")
Dim specialChars As New List(Of String)(New String() {"@", "!", ".", ",", "(", ")", "/", "#", "$", "&", "+", "-", "_", "=", ":", "'", "*", "^", "`", "<", ">", "[", "]", "{", "}", "", "|", ControlChars.Quote})
Dim firstChars As String = ""
Dim LastChars As String = ""
InputString = String.Empty
For Each item As String In strArray
Dim str As String = item
firstChars = String.Empty
LastChars = String.Empty
For Each ch As Char In str
If Not specialChars.Contains(ch) Then
Exit For
Else
firstChars += ch
End If
Next
For Each spChar As Char In firstChars.ToCharArray()
str = str.Trim(spChar)
Next
For i As Integer = str.Length - 1 To 0 Step -1
If Not specialChars.Contains(str(i)) Then
Exit For
Else
LastChars = str(i) + LastChars
End If
Next
For Each spChar As String In specialChars
str = str.Trim(spChar)
Next
If Not String.IsNullOrWhiteSpace(str) Then
For Each element In alWordList
If element.ToLower = str.ToLower Then
str = str.Trim()
eLength = element.Length
For x = 3 To eLength - 1
AttachtoEnd = AttachtoEnd & "*"
starPosition += 1
Next
str = str.Substring(0, str.Length - starPosition) & AttachtoEnd
End If
AttachtoEnd = ""
starPosition = 0
Next
End If
InputString += firstChars + str + LastChars & " "
Next
Return InputString
End Function

所以现在我认为最好回到正则表达式,它的效果非常好,只需要它太处理大写。

最后一点...要检查的单词以数组列表的形式出现。

如果您想替换字符串中的所有"坏词"单词,保留前 3 个字母,其余字母替换为星号,如phu***,并且您想以不区分大小写的方式进行比较; 没有内置方法。你可以使用

  • Regex.ReplaceRegexOptions.IgnoreCase
  • Microsoft.VisualBasic.Strings.ReplaceCompareMethod.Text.

但两者都有缺点,即它们将用新值替换旧值,而新值不会保留旧情况。如果单词PHUCK并且您在列表中的"坏词"是Phuck,它将替换为Ph***PH***

既然您已经评论说这很重要,那么唯一的方法是编写一个自定义方法:

Module StringExtensions
<Extension()>
Public Function ReplaceBadWords(ByVal str As String, ByVal badWords As IEnumerable(Of String), ByVal comparison As StringComparison, ByVal Optional showClearTextLength As Integer = 3, ByVal Optional obfuscateChar As Char = "*"c) As String
Dim sb As StringBuilder = New StringBuilder(str)
For Each badWord As String In badWords
Dim index As Integer = str.IndexOf(badWord, comparison)
While index <> -1
Dim oldValue As String = str.Substring(index, badWord.Length)
Dim newValue As String
If badWord.Length > showClearTextLength Then
newValue = oldValue.Remove(showClearTextLength) & New String(obfuscateChar, oldValue.Length - showClearTextLength)
Else
newValue = New String(obfuscateChar, oldValue.Length)
End If
For i As Integer = index To index + newValue.Length - 1
sb(i) = newValue(i - index)
Next
index += newValue.Length
index = str.IndexOf(badWord, index, comparison)
End While
Next
Return sb.ToString()
End Function
End Module

使用您的(愚蠢的)样本:

Dim replaced = "phuck will get check where as Phuck or PHUCK".
ReplaceBadWords({ "Phuck", "ILL" }, StringComparison.CurrentCultureIgnoreCase)

结果:

phu** w*** get check where as Phu** or PHU**

如果你有大量的"坏话",一个平行的版本:

<Extension()>
Public Function ReplaceBadWordsParallel(ByVal str As String, ByVal badWords As IEnumerable(Of String), ByVal comparison As StringComparison, ByVal Optional showClearTextLength As Integer = 3, ByVal Optional obfuscateChar As Char = "*"c) As String
Dim sb As StringBuilder = New StringBuilder(str)
Parallel.ForEach(badWords, 
Sub(badWord)
Dim index As Integer = str.IndexOf(badWord, comparison)
While index <> -1
Dim oldValue As String = str.Substring(index, badWord.Length)
Dim newValue As String
If badWord.Length > showClearTextLength Then
newValue = oldValue.Remove(showClearTextLength) & New String(obfuscateChar, oldValue.Length - showClearTextLength)
Else
newValue = New String(obfuscateChar, oldValue.Length)
End If
For i As Integer = index To index + newValue.Length - 1
sb(i) = newValue(i - index)
Next
index += newValue.Length
index = str.IndexOf(badWord, index, comparison)
End While
End Sub)
Return sb.ToString()
End Function

请注意,我还没有检查并行版本是否是线程安全的


C# 版本(如果有人感兴趣):

public static string ReplaceBadWords(this string str, IEnumerable<string> badWords, StringComparison comparison, int showClearTextLength = 3, char obfuscateChar = '*')
{
StringBuilder sb = new StringBuilder(str);
foreach (string badWord in badWords)
{
int index = str.IndexOf(badWord, comparison);
while (index != -1)
{
string oldValue = str.Substring(index, badWord.Length);
string newValue;
if (badWord.Length > showClearTextLength)
{
newValue = oldValue.Remove(showClearTextLength) + new string(obfuscateChar, oldValue.Length - showClearTextLength);
}
else
{
newValue = new string(obfuscateChar, oldValue.Length);
}
for (int i = index; i < index + newValue.Length; i++)
sb[i] = newValue[i - index];
index += newValue.Length;
index = str.IndexOf(badWord, index, comparison);
}
}           
return sb.ToString();
}

如果你的初始代码有效,只需使正则表达式不区分大小写:

r = New Regex("b" & element, RegexOptions.IgnoreCase)

区分大小写意味着正则表达式不关心大写或小写。

有关详细信息,请参阅正则表达式选项的文档。

相关内容

最新更新