我正试图使一个电子邮件刮板,通过某些电子邮件寻找值存储在CSV文件中。我已经尝试了很多方法来解决这个问题,但到目前为止还没有成功。
# Function to get email content part i.e its body part
def get_body(msg):
if msg.is_multipart():
return get_body(msg.get_payload(decode=True)).decode()
else:
return msg.get_payload(decode=True).decode()
# Function to search for a key value pair
def search(key, value, con):
result, data = con.search(None, key, '"{}"'.format(value))
return data
# Function to get the list of emails under this label
def get_emails(result_bytes):
print("get email")
msgs = [] # all the email data are pushed inside an array
for num in result_bytes[0].split():
typ, data = con.fetch(num, '(RFC822)')
msgs.append(data)
return msgs
# this is done to make SSL connection with GMAIL
con = imaplib.IMAP4_SSL(imap_url)
con.login(user, password)
con.select('Inbox')
msg_ids = get_emails(search('SUBJECT', 'TESTTITELPYTHON', con))
for msg in msg_ids[::-1]:
for sent in msg:
if type(sent) is tuple:
print(msg)
# encoding set as utf-8
content = sent[1], 'utf-8'
data = str(content)
# Handling errors related to unicodenecode
try:
indexstart = data.find("span")
data2 = data[indexstart + 5: len(data)]
indexend = data2.find("</div>")
# printtng the required content which we need
# to extract from our email i.e our body
waarde = data2[0: indexend]
test_naam_1 = waarde.split("Naam: ",1)[1]
echte_naam = test_naam_1.split("Email: ",-1)[0]
email_test = waarde.split("Email: ",1)[1]
echte_email = email_test.split("Tel nr.: ",-1)[0]
tel_test = waarde.split("Tel nr.: ",1)[1]
echte_tel = tel_test.split("Onderwerp: ",-1)[0]
subj_test = waarde.split("Onderwerp: ",1)[1]
echte_subj = subj_test.split("Bericht: ",-1)[0]
print("---ADRESGEGEVENS---")
print("---Naam: " + echte_naam + "---")
print("---Naam: " + echte_email + "---")
print("---Naam: " + echte_tel + "---")
print("---Naam: " + echte_subj + "---")
现在在我的结果中,我仍然收到这些丑陋的换行符,在我的标记中看起来如下:
[(b'12638 (RFC822 {1973}', b'MIME-Version: 1.0rnDate: Mon, 25 Oct 2021 16:41:46 +0200rnMessage-ID: <CAJDn=xsVynQqp7BwYoGZB=v21-AAR5=xcMkQ8D2kXE7ZpYFNNQ@mail.example.com>rnSubject: TESTTITELPYTHONrnFrom: Patrick Merkx <patrick@example.nl>rnTo: Patrick Merkx <patrick@example.nl>rnContent-Type: multipart/alternative; boundary="00000000000042e6ae05cf2e5c7e"rnrn--00000000000042e6ae05cf2e5c7ernContent-Type: text/plain; charset="UTF-8"rnrnContactformulier ingevuld door:rnNaam: Patrick MerkxrnEmail: merkx.patrick@example.comrnTel nr.: 0611381219rnrnOnderwerp: Nog een testrnrnBericht:rnBerichtrnrn--00000000000042e6ae05cf2e5c7ernContent-Type: text/html; charset="UTF-8"rnContent-Transfer-Encoding: quoted-printablernrn<div dir=3D"ltr"><div><div dir=3D"ltr" class=3D"gmail_signature" data-smart=rnmail=3D"gmail_signature"><div dir=3D"ltr"><div><div dir=3D"ltr"><div><div d=rnir=3D"ltr"><div style=3D"font-stretch:normal;font-size:13.33px;line-height:=rn19.99px;background:none;border:0px rgb(34,34,34);width:600px;overflow:visib=rnle;min-height:0px;outline-width:0px"><span class=3D"gmail-il" style=3D"font=rn-size:small">Contactformulier</span><span style=3D"font-size:small">=C2=A0i=rnngevuld door:</span><br style=3D"font-size:small"><span style=3D"font-size:=rnsmall">Naam: Patrick Merkx</span><br style=3D"font-size:small"><span style=rn=3D"font-size:small">Email:=C2=A0</span><a href=3D"mailto:merkx.patrick@gma=rnil.com" target=3D"_blank" style=3D"font-size:small">merkx.patrick@example.com=rn</a><br style=3D"font-size:small"><span style=3D"font-size:small">Tel nr.: =rn0611381219</span><br style=3D"font-size:small"><br style=3D"font-size:small=rn"><span style=3D"font-size:small">Onderwerp: Nog een test</span><br style=rn=3D"font-size:small"><br style=3D"font-size:small"><span style=3D"font-size=rn:small">Bericht:</span><br style=3D"font-size:small"><span style=3D"font-si=rnze:small">Bericht</span><br></div></div></div></div></div></div></div></div=rn></div>rnrn--00000000000042e6ae05cf2e5c7e--'), b')']
class=3D"gmail-il" style=3D"font=rn-size:small">Contactformulier</span><span style=3D"font-size:small">=C2=A0i=rnngevuld door:</span><br style=3D"font-size:small"><span style=3D"font-size:=rnsmall">Naam: Patrick Merkx</span><br style=3D"font-size:small"><span style=rn=3D"font-size:small">Email:=C2=A0</span><a href=3D"mailto:merkx.patrick@gma=rnil.com" target=3D"_blank" style=3D"font-size:small">merkx.patrick@gmail.com=rn</a><br style=3D"font-size:small"><span style=3D"font-size:small">Tel nr.: =rn0611381219</span><br style=3D"font-size:small"><br style=3D"font-size:small=rn"><span style=3D"font-size:small">Onderwerp: Nog een test</span><br style=rn=3D"font-size:small"><br style=3D"font-size:small"><span style=3D"font-size=rn:small">Bericht:</span><br style=3D"font-size:small"><span style=3D"font-si=rnze:small">Bericht</span><br>
我也试过剥离身体标签,解码,也一直在尝试多种解决方案,但不幸的是到目前为止。到目前为止,我似乎无法用任何已知的方法删除这些换行符。
我做错了什么?
您正在查看与Content-Transfer-Encoding: quoted-printable
的MIME部分。解码它的正确方法是遍历MIME结构并解释部分内容。但没有必要明确地这样做;Python的email
库已经为你做了这些。
from email import message_from_bytes
from email.policy import default
...
msg_ids = get_emails(search('SUBJECT', 'TESTTITELPYTHON', con))
for msg in msg_ids[::-1]:
for sent in msg:
if type(sent) is tuple:
msg = message_from_bytes(sent[1], policy=default)
不幸的是,如果没有这些消息中的MIME结构的示例,我就不能确切地告诉您如何处理结果消息。可能你会有类似"primary"MIME主体部分;msg.get_body(preferencelist=('html', 'plain'))
将提取出该部分,get_content()
将提取出实际的身体部分。
policy=default
关键字参数选择在Python 3.6中引入的email.message.EmailMessage
对象类,而不是旧版本的遗留email.message.Message
对象。