我试图通过搜索Google找到我发现的脚本。正在与我收到的以前的电子邮件完美合作,因为它直接提取了"从"字段,但我没有遇到错误。
这是我的代码的样子:
#!/usr/bin/python
import imaplib
import sys
import email
import re
#FOLDER=sys.argv[1]
FOLDER='folder'
LOGIN='login@gmail.com'
PASSWORD='password'
IMAP_HOST = 'imap.gmail.com' # Change this according to your provider
email_list = []
email_unique = []
mail = imaplib.IMAP4_SSL(IMAP_HOST)
mail.login(LOGIN, PASSWORD)
mail.select(FOLDER)
result, data = mail.search(None, 'ALL')
ids = data[0]
id_list = ids.split()
for i in id_list:
typ, data = mail.fetch(i,'(RFC822)')
for response_part in data:
if isinstance(response_part, tuple):
msg = email.message_from_string(response_part[1])
sender = msg['reply-to'].split()[0]
address = re.sub(r'[<>]','',sender)
# Ignore any occurences of own email address and add to list
if not re.search(r'' + re.escape(LOGIN),address) and not address in email_list:
email_list.append(address)
print address
而不是用字符串拆分和切片弄乱,正确的方法是在标准库中使用email.utils软件包中使用 parseaddr
。它正确处理电子邮件标题中的各种法律地址格式。
一些示例:
>>> from email.utils import parseaddr
>>> parseaddr("sally@foo.com")
('', 'sally@foo.com')
>>> parseaddr("<sally@foo.com>")
('', 'sally@foo.com')
>>> parseaddr("Sally <sally@foo.com>")
('Sally', 'sally@foo.com')
>>> parseaddr("Sally Smith <sally@foo.com>")
('Sally Smith', 'sally@foo.com')
>>>
另外,您不应该假设电子邮件具有回复标题。许多人没有。