使用 Python 3.5 或 3.6 时,在使用 email
包的日期标头中加载具有无效小时的电子邮件后,尝试访问 date
标头会引发ValueError
异常:
>>> import email
>>> from email import policy
>>> m = email.message_from_binary_file(open('bad_date.txt', 'rb'), policy=policy.default)
>>> m['date']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.6/email/message.py", line 391, in __getitem__
return self.get(name)
File "/usr/lib/python3.6/email/message.py", line 471, in get
return self.policy.header_fetch_parse(k, v)
File "/usr/lib/python3.6/email/policy.py", line 162, in header_fetch_parse
return self.header_factory(name, value)
File "/usr/lib/python3.6/email/headerregistry.py", line 586, in __call__
return self[name](name, value)
File "/usr/lib/python3.6/email/headerregistry.py", line 197, in __new__
cls.parse(value, kwds)
File "/usr/lib/python3.6/email/headerregistry.py", line 303, in parse
value = utils.parsedate_to_datetime(value)
File "/usr/lib/python3.6/email/utils.py", line 214, in parsedate_to_datetime
tzinfo=datetime.timezone(datetime.timedelta(seconds=tz)))
ValueError: hour must be in 0..23
这是电子邮件中的标题:
Date: Tue, 06 Jun 2017 27:39:33 +0600
(我正在分析垃圾邮件,有人的垃圾邮件发送程序似乎不了解时区转换的工作原理。我也看到了负数...
email
包旨在通过将电子邮件注册为缺陷来处理解析电子邮件时遇到的问题,因此在这种情况下,引发异常似乎是错误的结果。
我可以尝试更新默认header_factory
这是default
策略的一部分来处理这种情况,但它似乎更像是 Python 中的一个错误,parsedate_to_datetime
以这种方式运行。(显然这种行为是故意的。
更新:我已将其作为Python错误提出
这是我目前采用的解决方法:
from email import policy
from email import errors
from email import _header_value_parser as parser
from email.headerregistry import HeaderRegistry, DateHeader
class DateHeaderRobust(DateHeader):
"""
Copied and updated from email/headerregistry.py to handle
ValueError returned by parsedate_to_datetime when a date header
has an invalid hour value (outside 0..23)
"""
@classmethod
def parse(cls, value, kwds):
try:
super().parse(value, kwds)
except ValueError:
kwds['defects'].append(
errors.InvalidHeaderDefect('Invalid value in date'))
kwds['datetime'] = None
kwds['decoded'] = value
kwds['parse_tree'] = parser.TokenList()
class UniqueDateHeader(DateHeaderRobust):
max_count = 1
header_factory = HeaderRegistry()
header_factory.map_to_type('date', UniqueDateHeader)
email_policy = policy.default.clone(header_factory=header_factory)
然后在阅读消息时(例如,使用 email.message_from_binary_file)
,使用 policy=email_policy
作为 kwarg。