我想写一个函数来使用正则表达式替换字符串。然而,这并没有起到必要的作用。不知道哪里不对。
我在Windows 10上使用Python 3.4.3。
这是nltk代码本中的代码。
import re
replacement_patterns = [
(r'won't', 'will not'),
(r'can't', 'cannot'),
(r'i'm', 'i am'),
(r'ain't', 'is not'),
(r'(w+)'ll', 'g<1> will'),
(r'(w+)n't', 'g<1> not'),
(r'(w+)'ve', 'g<1> have'),
(r'(w+)'s', 'g<1> is'),
(r'(w+)'re', 'g<1> are'),
(r'(w+)'d', 'g<1> would')
]
class RegexpReplacer(object):
def __init__(self, patterns=replacement_patterns):
self.patterns = [(re.compile(regex), repl) for (regex, repl) in patterns]
print("init")
print(self.patterns)
def replace(self, text):
print("In replace")
s = text
print(self.patterns)
for (pattern, repl) in self.patterns:
s = re.sub(pattern, repl, s)
print(s)
return s
if __name__ == "__main__":
print("RegEx replacers")
replacer = RegexpReplacer()
result = replacer.replace("can't is a contraction")
print(result)
result = replacer.replace("I should've done that thing I didn't do")
print(result)
您的replace
函数中有一个indent problem
:
class RegexpReplacer(object):
def replace(self, text):
print("In replace")
s = text
print(self.patterns)
for (pattern, repl) in self.patterns:
s = re.sub(pattern, repl, s)
print(s)
return s #here is the problem
关于你的函数的一点建议,删除print
行,使其更干净和样本。
class RegexpReplacer(object):
def replace(self, text):
for (pattern, repl) in self.patterns:
text = re.sub(pattern, repl, text)
return s
除了可接受的答案之外,您的代码还有一个额外的问题:在原始字符串中使用转义序列。例如
r'won't'
是一个原始字符串(r前缀),它不会展开转义序列,所以您的字符串实际上是
won't
用双引号代替
r"won't"
这个错误现在不会咬你,因为'
没有特殊的含义,所以它被转换为'
,但它会在其他时候,例如
r'\'
是长度为2的字符串。