IMAP中的换行- =rn -如何解码?



我正试图使一个电子邮件刮板,通过某些电子邮件寻找值存储在CSV文件中。我已经尝试了很多方法来解决这个问题,但到目前为止还没有成功。

# Function to get email content part i.e its body part
def get_body(msg):
if msg.is_multipart():
return get_body(msg.get_payload(decode=True)).decode()
else:
return msg.get_payload(decode=True).decode()

# Function to search for a key value pair
def search(key, value, con):
result, data = con.search(None, key, '"{}"'.format(value))
return data

# Function to get the list of emails under this label
def get_emails(result_bytes):
print("get email")
msgs = [] # all the email data are pushed inside an array
for num in result_bytes[0].split():
typ, data = con.fetch(num, '(RFC822)')
msgs.append(data)
return msgs

# this is done to make SSL connection with GMAIL
con = imaplib.IMAP4_SSL(imap_url)
con.login(user, password)
con.select('Inbox')
msg_ids = get_emails(search('SUBJECT', 'TESTTITELPYTHON', con))
for msg in msg_ids[::-1]:
for sent in msg:
if type(sent) is tuple:
print(msg)
# encoding set as utf-8
content = sent[1], 'utf-8'
data = str(content)

# Handling errors related to unicodenecode
try:
indexstart = data.find("span")
data2 = data[indexstart + 5: len(data)]
indexend = data2.find("</div>")

# printtng the required content which we need
# to extract from our email i.e our body

waarde = data2[0: indexend]
test_naam_1 = waarde.split("Naam: ",1)[1]
echte_naam = test_naam_1.split("Email: ",-1)[0]

email_test = waarde.split("Email: ",1)[1]
echte_email = email_test.split("Tel nr.: ",-1)[0]

tel_test = waarde.split("Tel nr.: ",1)[1]
echte_tel = tel_test.split("Onderwerp: ",-1)[0]

subj_test = waarde.split("Onderwerp: ",1)[1]
echte_subj = subj_test.split("Bericht: ",-1)[0]

print("---ADRESGEGEVENS---")
print("---Naam: " + echte_naam + "---")
print("---Naam: " + echte_email + "---")
print("---Naam: " + echte_tel + "---")
print("---Naam: " + echte_subj + "---")

现在在我的结果中,我仍然收到这些丑陋的换行符,在我的标记中看起来如下:

[(b'12638 (RFC822 {1973}', b'MIME-Version: 1.0rnDate: Mon, 25 Oct 2021 16:41:46 +0200rnMessage-ID: <CAJDn=xsVynQqp7BwYoGZB=v21-AAR5=xcMkQ8D2kXE7ZpYFNNQ@mail.example.com>rnSubject: TESTTITELPYTHONrnFrom: Patrick Merkx <patrick@example.nl>rnTo: Patrick Merkx <patrick@example.nl>rnContent-Type: multipart/alternative; boundary="00000000000042e6ae05cf2e5c7e"rnrn--00000000000042e6ae05cf2e5c7ernContent-Type: text/plain; charset="UTF-8"rnrnContactformulier ingevuld door:rnNaam: Patrick MerkxrnEmail: merkx.patrick@example.comrnTel nr.: 0611381219rnrnOnderwerp: Nog een testrnrnBericht:rnBerichtrnrn--00000000000042e6ae05cf2e5c7ernContent-Type: text/html; charset="UTF-8"rnContent-Transfer-Encoding: quoted-printablernrn<div dir=3D"ltr"><div><div dir=3D"ltr" class=3D"gmail_signature" data-smart=rnmail=3D"gmail_signature"><div dir=3D"ltr"><div><div dir=3D"ltr"><div><div d=rnir=3D"ltr"><div style=3D"font-stretch:normal;font-size:13.33px;line-height:=rn19.99px;background:none;border:0px rgb(34,34,34);width:600px;overflow:visib=rnle;min-height:0px;outline-width:0px"><span class=3D"gmail-il" style=3D"font=rn-size:small">Contactformulier</span><span style=3D"font-size:small">=C2=A0i=rnngevuld door:</span><br style=3D"font-size:small"><span style=3D"font-size:=rnsmall">Naam: Patrick Merkx</span><br style=3D"font-size:small"><span style=rn=3D"font-size:small">Email:=C2=A0</span><a href=3D"mailto:merkx.patrick@gma=rnil.com" target=3D"_blank" style=3D"font-size:small">merkx.patrick@example.com=rn</a><br style=3D"font-size:small"><span style=3D"font-size:small">Tel nr.: =rn0611381219</span><br style=3D"font-size:small"><br style=3D"font-size:small=rn"><span style=3D"font-size:small">Onderwerp: Nog een test</span><br style=rn=3D"font-size:small"><br style=3D"font-size:small"><span style=3D"font-size=rn:small">Bericht:</span><br style=3D"font-size:small"><span style=3D"font-si=rnze:small">Bericht</span><br></div></div></div></div></div></div></div></div=rn></div>rnrn--00000000000042e6ae05cf2e5c7e--'), b')']
class=3D"gmail-il" style=3D"font=rn-size:small">Contactformulier</span><span style=3D"font-size:small">=C2=A0i=rnngevuld door:</span><br style=3D"font-size:small"><span style=3D"font-size:=rnsmall">Naam: Patrick Merkx</span><br style=3D"font-size:small"><span style=rn=3D"font-size:small">Email:=C2=A0</span><a href=3D"mailto:merkx.patrick@gma=rnil.com" target=3D"_blank" style=3D"font-size:small">merkx.patrick@gmail.com=rn</a><br style=3D"font-size:small"><span style=3D"font-size:small">Tel nr.: =rn0611381219</span><br style=3D"font-size:small"><br style=3D"font-size:small=rn"><span style=3D"font-size:small">Onderwerp: Nog een test</span><br style=rn=3D"font-size:small"><br style=3D"font-size:small"><span style=3D"font-size=rn:small">Bericht:</span><br style=3D"font-size:small"><span style=3D"font-si=rnze:small">Bericht</span><br>

我也试过剥离身体标签,解码,也一直在尝试多种解决方案,但不幸的是到目前为止。到目前为止,我似乎无法用任何已知的方法删除这些换行符。

我做错了什么?

您正在查看与Content-Transfer-Encoding: quoted-printable的MIME部分。解码它的正确方法是遍历MIME结构并解释部分内容。但没有必要明确地这样做;Python的email库已经为你做了这些。

from email import message_from_bytes
from email.policy import default
...
msg_ids = get_emails(search('SUBJECT', 'TESTTITELPYTHON', con))
for msg in msg_ids[::-1]:
for sent in msg:
if type(sent) is tuple:
msg = message_from_bytes(sent[1], policy=default)

不幸的是,如果没有这些消息中的MIME结构的示例,我就不能确切地告诉您如何处理结果消息。可能你会有类似"primary"MIME主体部分;msg.get_body(preferencelist=('html', 'plain'))将提取出该部分,get_content()将提取出实际的身体部分。

policy=default关键字参数选择在Python 3.6中引入的email.message.EmailMessage对象类,而不是旧版本的遗留email.message.Message对象。

更详细地说,试图将原始电子邮件正文解码为UTF-8是非常错误的。典型的MIME消息有几个部分,每个部分可能有不同的编码,其中许多部分肯定不使用UTF-8作为编码(尽管它变得越来越普遍;但是,通常情况下,实际的UTF-8将在内容传输编码之后,该编码可以保护它在传输过程中免受损坏(可能不是8位干净的路由)。