如何从给定文件中删除所有电子邮件地址。
示例文件mail.txt:
Hello from me
how are you?
shubhamg199630@gmail.com to priya@yahoo.com hi@yahoo.co.in about the meeting
sharp @2PM.okay see you bye@yahoo.co.in olad-hola
预期输出:
Hello from me
how are you?
to about the meeting
sharp @2PM.okay see you olad-hola
您可以使用re
模块中的re.sub((,将地址替换为空字符串。
with open("mail.txt", "r") as f:
text = f.read()
clean_text = re.sub(r"S+@S+", "", text)
print(clean_text)
这使用了一个简化的正则表达式,该表达式还匹配无效的电子邮件地址。这将删除所有看起来像电子邮件地址的东西。因为您不想验证地址的正确性,所以这没有问题。
看看这个问题,找到一个可以用于电子邮件的正则表达式。然后,您可以使用标准库中的re模块将该正则表达式的所有匹配项替换为空字符串。
使用链接中问题的已接受答案中的正则表达式,我们有
import re
with open("sample.txt") as f:
content = f.read()
pattern = r"""(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[x01-x08x0bx0cx0e-x1fx21x23-x5bx5d-x7f]|\[x01-x09x0bx0cx0e-x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])).){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[x01-x08x0bx0cx0e-x1fx21-x5ax53-x7f]|\[x01-x09x0bx0cx0e-x7f])+)])"""
replaced_content = re.sub(pattern, "<removed>", content)
print()
print(replaced_content)
这将打印文本
Hello from me
how are you?
<removed> to <removed> <removed> about the meeting
sharp @2PM.okay see you <removed> olad-hola
您可以这样做:
a= 'Hello from me
how are you?
shubhamg199630@gmail.com to priya@yahoo.com hi@yahoo.co.in about the meeting
sharp @2PM.okay see you bye@yahoo.co.in olad-hola'
for word in a.split():
if '@' in word:
a = a.replace(word, '').replace(' ', 'n')
print(a)