使用pytz有效地本地化日期时间数组



将朴素datetime.datetime对象数组转换为时区感知datetime对象数组的最有效方法是什么?

目前我把它们放在numpy数组中。答案不一定要以numpy数组结束,但应该考虑从numpy数组开始。

。如果我有这个:

import numpy as np
import pytz
from datetime import datetime
# Time zone information
timezone = pytz.FixedOffset(-480)
# Numpy array of datetime objects
datetimes = np.array([datetime(2022, 1, 1, 12, 0, 0), datetime(2022, 1, 2, 12, 0, 0)])

如何使datetimes识别时区?

显然,列表推导式可以工作,但对于大型数组,它似乎没有达到应有的效率。我想要一个矢量化的操作。

ChatGPT告诉我这将工作(剧透警告,它没有)

# Add time zone information to each datetime object
datetimes_with_timezone = timezone.localize(datetimes, is_dst=None)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:UsersNickAnaconda3envspftoolslibsite-packagespytztzinfo.py", line 317, in localize
if dt.tzinfo is not None:
AttributeError: 'numpy.ndarray' object has no attribute 'tzinfo'

如果您想继续使用熊猫,它可能会带来一些好处。以下是%timeits的一些相对比较选项:

import numpy as np
import pandas as pd
from datetime import datetime, timezone, timedelta
# Time zone information
tz = timezone(timedelta(minutes=-480))
# Numpy array of datetime objects, lets make it 1 day, second resolution
dt_array = np.array([datetime(2022, 1, 1) + timedelta(seconds=i) for i in range(86400)])
# convert array to Series, then set tz:
dt_series = pd.Series(dt_array).dt.tz_localize(tz)
# %timeit pd.Series(dt_array).dt.tz_localize(tz)
# 12.1 ms ± 22.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# you can also get a numpy array back:
dt_series = pd.Series(dt_array).dt.tz_localize(tz).to_numpy()
# %timeit pd.Series(dt_array).dt.tz_localize(tz).to_numpy()
# 188 ms ± 717 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# good old list comp:
dt_list = [d.replace(tzinfo=tz) for d in dt_array]
# %timeit [d.replace(tzinfo=tz) for d in dt_array]
# 93.3 ms ± 2.3 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# might also be put into a numpy array:
dt_array_tz = np.array([d.replace(tzinfo=tz) for d in dt_array])
# %timeit np.array([d.replace(tzinfo=tz) for d in dt_array])
# 212 ms ± 2.61 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

此外,pytz.timezone. localalize稍微慢一点(顺便说一句,pytz已弃用):

import pytz
tz = pytz.FixedOffset(-480)
dt_list = [tz.localize(d) for d in dt_array]
# %timeit [tz.localize(d) for d in dt_array]
# 105 ms ± 436 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

最新更新