从给定的txt文件中删除电子邮件地址-Python



如何从给定文件中删除所有电子邮件地址。

示例文件mail.txt:

Hello from me
how are you?
shubhamg199630@gmail.com to priya@yahoo.com hi@yahoo.co.in about the meeting 
sharp @2PM.okay see you  bye@yahoo.co.in olad-hola

预期输出:

Hello from me
how are you?
to about the meeting 
sharp @2PM.okay see you olad-hola

您可以使用re模块中的re.sub((,将地址替换为空字符串。

with open("mail.txt", "r") as f:
text = f.read()
clean_text = re.sub(r"S+@S+", "", text)
print(clean_text)

这使用了一个简化的正则表达式,该表达式还匹配无效的电子邮件地址。这将删除所有看起来像电子邮件地址的东西。因为您不想验证地址的正确性,所以这没有问题。

看看这个问题,找到一个可以用于电子邮件的正则表达式。然后,您可以使用标准库中的re模块将该正则表达式的所有匹配项替换为空字符串。

使用链接中问题的已接受答案中的正则表达式,我们有

import re
with open("sample.txt") as f:
content = f.read()
pattern = r"""(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[x01-x08x0bx0cx0e-x1fx21x23-x5bx5d-x7f]|\[x01-x09x0bx0cx0e-x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])).){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[x01-x08x0bx0cx0e-x1fx21-x5ax53-x7f]|\[x01-x09x0bx0cx0e-x7f])+)])"""
replaced_content = re.sub(pattern, "<removed>", content) 
print()
print(replaced_content)

这将打印文本

Hello from me
how are you?
<removed> to <removed> <removed> about the meeting 
sharp @2PM.okay see you  <removed> olad-hola

您可以这样做:

a= 'Hello from me
how are you?
shubhamg199630@gmail.com to priya@yahoo.com hi@yahoo.co.in about the meeting
sharp @2PM.okay see you  bye@yahoo.co.in olad-hola'
for word in a.split():
if '@' in word:
a = a.replace(word, '').replace('    ', 'n')
print(a)

最新更新