我有一个包含列Date
&Time
,它们是本地时钟读数(作为字符串(。以及列dst
,其指示夏令时是否使用针对冬季的W
和针对夏季的S
来激活。
我知道时区是Europe/Berlin
,因此在冬季与UTC相差1小时,在夏季则相差2小时。
我对这种表示非常不满意,希望转换为UTC中的可识别日期时间对象,并且只在需要时提供人类可读的时间。
Date Time dst
27.03.2022 01:15:00 W
27.03.2022 01:30:00 W
27.03.2022 01:45:00 W
27.03.2022 03:00:00 S
27.03.2022 03:15:00 S
27.03.2022 03:30:00 S
27.03.2022 03:45:00 S
27.03.2022 04:00:00 S
27.03.2022 04:15:00 S
27.03.2022 04:30:00 S
27.03.2022 04:45:00 S
27.03.2022 05:00:00 S
27.03.2022 05:15:00 S
我的第一种方法是,使用panda检索日期时间对象,定位它,并根据给定的dst使用numpy减去两到一个小时。
from datetime import datetime, timedelta, timezone
from dateutil import tz
import numpy as np
import pandas as pd
df['datetime'] = pd.to_datetime(df['Date'] + df['Time'], format='%d.%m.%Y%H:%M:%S')
df['datetime_aware'] = df['datetime'].dt.tz_localize(tz='Europe/Berlin')
df['datetime_aware_subtracted'] = np.where(df['dst']=='S', df['datetime_aware']-timedelta(hours=2),
df['datetime_aware']-timedelta(hours=1))
这会产生几乎正确的结果,但在03:00-05:00之间(在datetime
中(,datetime_aware_subtracted
列会产生错误的结果。减去一小时太多+减去一小时时间偏移太少。我觉得在dst边界上减去时间不是一个好主意。
datetime datetime_aware datetime_aware_subtracted
27.03.2022 01:15 2022-03-27 01:15:00+01:00 2022-03-27 00:15:00+01:00
27.03.2022 01:30 2022-03-27 01:30:00+01:00 2022-03-27 00:30:00+01:00
27.03.2022 01:45 2022-03-27 01:45:00+01:00 2022-03-27 00:45:00+01:00
27.03.2022 03:00 2022-03-27 03:00:00+02:00 2022-03-27 00:00:00+01:00
27.03.2022 03:15 2022-03-27 03:15:00+02:00 2022-03-27 00:15:00+01:00
27.03.2022 03:30 2022-03-27 03:30:00+02:00 2022-03-27 00:30:00+01:00
27.03.2022 03:45 2022-03-27 03:45:00+02:00 2022-03-27 00:45:00+01:00
27.03.2022 04:00 2022-03-27 04:00:00+02:00 2022-03-27 01:00:00+01:00
27.03.2022 04:15 2022-03-27 04:15:00+02:00 2022-03-27 01:15:00+01:00
27.03.2022 04:30 2022-03-27 04:30:00+02:00 2022-03-27 01:30:00+01:00
27.03.2022 04:45 2022-03-27 04:45:00+02:00 2022-03-27 01:45:00+01:00
27.03.2022 05:00 2022-03-27 05:00:00+02:00 2022-03-27 03:00:00+02:00
27.03.2022 05:15 2022-03-27 05:15:00+02:00 2022-03-27 03:15:00+02:00
我的第二种方法是逆减法和本地化。
df['datetime'] = pd.to_datetime(df['Date'] + df['Time'], format='%d.%m.%Y%H:%M:%S')
df['datetime_subtracted'] = np.where(df['dst']=='S', df['datetime']-timedelta(hours=2),
df['datetime']-timedelta(hours=1))
df['datetime_subtracted_aware'] = df['datetime_subtracted'].dt.tz_localize(tz='Europe/Berlin')
这给出了正确的天真结果,但在减去后,在本地化时给出了NonExistentTimeError
(理所当然(。
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:ProgramDataMiniconda3envsenvlibsite-packagespandascoreaccessor.py", line 94, in f
return self._delegate_method(name, *args, **kwargs)
File "C:ProgramDataMiniconda3envsenvlibsite-packagespandascoreindexesaccessors.py", line 123, in _delegate_method
result = method(*args, **kwargs)
File "C:ProgramDataMiniconda3envsenvlibsite-packagespandascoreindexesdatetimes.py", line 273, in tz_localize
arr = self._data.tz_localize(tz, ambiguous, nonexistent)
File "C:ProgramDataMiniconda3envsenvlibsite-packagespandascorearrays_mixins.py", line 84, in method
return meth(self, *args, **kwargs)
File "C:ProgramDataMiniconda3envsenvlibsite-packagespandascorearraysdatetimes.py", line 1043, in tz_localize
new_dates = tzconversion.tz_localize_to_utc(
File "pandas_libstslibstzconversion.pyx", line 328, in pandas._libs.tslibs.tzconversion.tz_localize_to_utc
pytz.exceptions.NonExistentTimeError: 2022-03-27 02:00:00
我知道Europe/Berlin
时区中UTC日期时间对象的最佳选项是什么?换句话说,创建一个新列datetime_aware_localized
,该列显示相同的日期&时间作为CCD_ 12列,但它是正确时区中的可感知日期时间对象。
您可以告诉tz_localize
如何处理不存在的时间:
非存在tr,默认"提高">
在时钟由于夏令时而向前移动的特定时区中,不存在的时间并不存在。有效值为:
"shift_forward"将不存在的时间向前移动到最接近的现有时间
"shift_backward"将不存在的时间向后移动到最接近的现有时间
"NaT"将返回不存在时间的NaT
timedelta对象将通过timedelta 移动不存在的时间
如果不存在时间,"raise"将引发NonExistentTimeError。
用NaT:替换
dt = pd.to_datetime(df['Date']+' '+df['Time'])
corr = pd.to_timedelta(df['dst'].map({'W': 1, 'S': 2}), unit='H')
# converting to NaT
dt.sub(corr).dt.tz_localize(tz='Europe/Berlin', nonexistent='NaT')
输出:
0 2022-03-27 00:15:00+01:00
1 2022-03-27 00:30:00+01:00
2 2022-03-27 00:45:00+01:00
3 2022-03-27 01:00:00+01:00
4 2022-03-27 01:15:00+01:00
5 2022-03-27 01:30:00+01:00
6 2022-03-27 01:45:00+01:00
7 NaT
8 NaT
9 NaT
10 NaT
11 2022-03-27 03:00:00+02:00
12 2022-03-27 03:15:00+02:00
dtype: datetime64[ns, Europe/Berlin]
向前移动:
dt.sub(corr).dt.tz_localize(tz='Europe/Berlin', nonexistent='shift_forward')
0 2022-03-27 00:15:00+01:00
1 2022-03-27 00:30:00+01:00
2 2022-03-27 00:45:00+01:00
3 2022-03-27 01:00:00+01:00
4 2022-03-27 01:15:00+01:00
5 2022-03-27 01:30:00+01:00
6 2022-03-27 01:45:00+01:00
7 2022-03-27 03:00:00+02:00
8 2022-03-27 03:00:00+02:00
9 2022-03-27 03:00:00+02:00
10 2022-03-27 03:00:00+02:00
11 2022-03-27 03:00:00+02:00
12 2022-03-27 03:15:00+02:00
dtype: datetime64[ns, Europe/Berlin]
要澄清的是,本地化与一样工作得很好
pd.to_datetime(df['Date'] + df['Time'], format='%d.%m.%Y%H:%M:%S').dt.tz_localize("Europe/Berlin")
0 2022-03-27 01:15:00+01:00
1 2022-03-27 01:30:00+01:00
2 2022-03-27 01:45:00+01:00
3 2022-03-27 03:00:00+02:00
4 2022-03-27 03:15:00+02:00
5 2022-03-27 03:30:00+02:00
6 2022-03-27 03:45:00+02:00
7 2022-03-27 04:00:00+02:00
8 2022-03-27 04:15:00+02:00
9 2022-03-27 04:30:00+02:00
10 2022-03-27 04:45:00+02:00
11 2022-03-27 05:00:00+02:00
12 2022-03-27 05:15:00+02:00
dtype: datetime64[ns, Europe/Berlin]
如果您需要UTC,请使用
pd.to_datetime(df['Date'] + df['Time'], format='%d.%m.%Y%H:%M:%S').dt.tz_localize("Europe/Berlin").dt.tz_convert("UTC")
0 2022-03-27 00:15:00+00:00
1 2022-03-27 00:30:00+00:00
2 2022-03-27 00:45:00+00:00
3 2022-03-27 01:00:00+00:00
4 2022-03-27 01:15:00+00:00
5 2022-03-27 01:30:00+00:00
6 2022-03-27 01:45:00+00:00
7 2022-03-27 02:00:00+00:00
8 2022-03-27 02:15:00+00:00
9 2022-03-27 02:30:00+00:00
10 2022-03-27 02:45:00+00:00
11 2022-03-27 03:00:00+00:00
12 2022-03-27 03:15:00+00:00
dtype: datetime64[ns, UTC]
关于NonExistentTimeError
错误,您可以在这里找到一个示例,其中这实际上是一个问题,关键字nonexistent
是必要的。