需要帮助删除nlp任务的标点符号和替换数字



例如,我有一个字符串:

sentence = ['cracked $300 million','she's resolutely, smitten ', 'that's creative [r]', 'the market ( knowledge check : prices up!']

我想去掉标点符号,用"£"符号代替数字。我已经尝试过了,但当我尝试同时运行它们时,只能替换其中一个。我的代码低于

import re
s =([re.sub(r'[!":$()[]',]',' ', word) for word in sentence]) 
s= [([re.sub(r'd+','£', word) for word in s])]
s)

我认为问题可能在方括号里??非常感谢。

如果您想用空格替换某些特定的标点符号,并用£符号替换任何数字块,您可以使用

import re
rx = re.compile(r'''[][!":$()',]|(d+)''')
sentence = ['cracked $300 million','she's resolutely, smitten ', 'that's creative [r]', 'the market ( knowledge check : prices up!']
s = [rx.sub(lambda x: '£' if x.group(1) else ' ', word) for word in sentence] 
print(s) # => ['cracked  £ million', 'she s resolutely  smitten ', 'that s creative  r ', 'the market   knowledge check   prices up ']

请参阅Python演示。

注意[]在字符类中的位置:当]在开头时,它不需要转义,并且[在字符类内根本不需要转义。我还使用了一个三引号字符串文字,因此您可以按原样使用"',而不需要额外的转义。

因此,这里,[][!":$()',]|(d+)匹配][!":$()',,或者匹配并捕获到组1中的一个或多个数字。如果第1组匹配,则替换为欧元符号,否则为空格。

很抱歉我没有看到你请求的第二部分,但你可以查看这个数字和标点符号

sentence = ['cracked $300 million', 'she's resolutely, smitten ', 'that's creative [r]',
'the market ( knowledge check : prices up!']
def replaceDigitAndPunctuation(newSentence):
new_word = ""
for char in newSentence:
if char in string.digits:
new_word += "£"
elif char in string.punctuation:
pass
else:
new_word += char
return new_word

for i in range(len(sentence)):
sentence[i] = replaceAllDigitInString(sentence[i])

使用您的输入和模式:

>>> ([re.sub(r'[!":$()[]',]',' ', word) for word in sentence]) 
['cracked $300 million', "she's resolutely, smitten ", "that's creative [r]", 'the market ( knowledge check : prices up!']
>>> 

原因是[!":$()[]被视为一个字符组,而',]是一个文字模式,即引擎正在精确地查找',]

组中的右括号转义:

]

>>> ([re.sub(r'[!":$()[]',]',' ', word) for word in sentence]) 
['cracked  300 million', 'she s resolutely  smitten ', 'that s creative  r ', 'the market   knowledge check   prices up ']
>>> 

编辑:如果你试图将多个动作堆叠成一个列表理解,那么将你的动作放在一个函数中并调用该函数:

def process_word(word):
word = re.sub(r'[!":$()[]',]',' ', word)
word = re.sub(r'd+','£', word)
return word

结果:

>>> [process_word(word) for word in sentence]
['cracked  £ million', 'she s resolutely  smitten ', 'that s creative  r ', 'the market   knowledge check   prices up ']

相关内容

  • 没有找到相关文章