这个字符串是邮件的主题。我通过imaplib得到这个字符串。此字符串的类型为"str"。
谢谢!
#-*- coding: utf-8 -*-
import imaplib
from email.parser import HeaderParser
conn = imaplib.IMAP4('imap.gmail.com')
conn.login('myuser', 'my_pass')
conn.select()
conn.search(None, 'ALL') # returns a nice list of messages...
data = conn.fetch(1, '(BODY[HEADER])')
header_data = data[1][0][1]
parser = HeaderParser()
msg = parser.parsestr(header_data)
print repr(msg['subject'].decode('utf-8'))
结果:u'=?UTF-8?B?V2VsY29tZSB0byBBdGxhc01haWw=?='
使用email.header
包中的decode_header
和make_header
函数来处理标头,然后将标头对象转换为unicode:
from email.header import make_header, decode_header
header = make_header(decode_header(msg['subject']))
unicode_header = unicode(header)
print repr(unicode_header) # prints: u'Welcome to AtlasMail'
电子邮件主题中非ascii字符的编码在RFC-1342 -中有描述如您所见,在本例中,您的utf-8字节是base64编码的。
所以,你可以这样做:
import base64, quopri
try:
encoding, enc_type, subject = msg["subject"].split("?", 2)
except ValueError:
subject = msg["subject"].decode("utf-8")
enc_type = "N/A"
if enc_type == "B":
subject = base64.decodestring(subject).decode(encoding.lower())
elif enc_type == "Q":
subject = quopri.decodestring(subject).decode(encoding.lower())