使用python将eml转换为markdown



我已经从gmail下载了eml文件,并希望将邮件内容转换为markdown。我使用的代码是:

import os
import email
from markdownify import markdownify as md
from email import message_from_file
from email import policy
from email.parser import BytesParser
import glob
files = glob.glob("file_gmail.eml") # returns list of files
with open(files[0], "rb") as fp:
msg = BytesParser(policy=policy.default).parse(fp)
asunto = msg['subject']
msgmkd = md(msg.get_body(preferencelist=("html")).get_content())
file = open(asunto + '.md', 'w')
file.write(msgmkd)
file.close()

缺点是它给我留下了CSS和其他一些东西,我无法删除它

html [if !mso]><meta http-equiv="X-UA-Compatible" content="IE=edge"><!--<![endif] .ExternalClass { width: 100%; background: inherit; background-color: inherit; } .ExternalClass p, .ExternalClass ul, .ExternalClass ol { Margin: 0; } .undoreset div p, .undoreset p { margin-bottom: 20px; } div[class^="aolmail_divbody"] { overflow: auto; } [owa] #ac-footer { padding: 20px 0px !important; background: inherit; background-color: inherit; }  @media only screen and (max-width: 600px) { /*----------------
...

有更好的方法吗?

使用beautuloup 提前删除样式(可能还有其他标签(

asunto = msg['subject']
html = msg.get_body(preferencelist=("html")).get_content()
dom = BS(html, features="html.parser")
elements = dom.select('style,script')
for element in elements: _ = element.extract()
msgmkd = md(str(dom))

相关内容

  • 没有找到相关文章

最新更新