带有时间戳的DataFrame列需要本地化多个不同的时区(AttributeError:只能使用具有类似日期时间值的.d



我有一个DataFrame(6M行(,有两列,一列包含本地时间(时区naive(,另一列包含时区。类似这样的东西:

|    | SCHEDULED_DEPARTURE   | ORIGIN_TZ           |
|---:|:----------------------|:--------------------|
|  0 | 2020-11-30 11:40:00   | America/New_York    |
|  1 | 2020-11-30 16:51:00   | America/New_York    |
|  2 | 2020-11-30 09:05:00   | America/Chicago     |
|  3 | 2020-11-30 19:18:00   | America/Chicago     |
|  4 | 2020-11-30 10:36:00   | America/New_York    |
|  5 | 2020-11-30 12:10:00   | America/Los_Angeles |
|  6 | 2020-11-30 16:05:00   | America/New_York    |
|  7 | 2020-11-30 12:14:00   | America/New_York    |
|  8 | 2020-11-30 16:05:00   | America/New_York    |
|  9 | 2020-11-30 12:40:00   | America/Chicago     |

我试图使用for例程来定位SCHEDULED_DEPARTURE的每一行,该例程按每个时区对df进行子集设置,添加时区并保持循环:

for tz in df['ORIGIN_TZ'].unique():
mask_tz = (df['ORIGIN_TZ'] == tz)
df.loc[mask_tz,'SCHEDULED_DEPARTURE'] = df.loc[mask_tz,'SCHEDULED_DEPARTURE'].dt.tz_localize(tz)

奇怪的是,有时它工作,有时它返回以下错误:

AttributeError:只能使用具有类似日期时间值的.dt访问器


提取SCHEDULED_DEPARTURE列时,类型显然是datetime,如:

Name: SCHEDULED_DEPARTURE, Length: 5714008, dtype: datetime64[ns]

你知道怎么解决这个问题吗?每列可以有一个以上的时区吗?


以下是复制电子样本df:的代码

df = pd.DataFrame({'SCHEDULED_DEPARTURE': {0: pd.Timestamp('2020-11-30 10:15:00'), 1: pd.Timestamp('2020-11-30 07:55:00'), 2: pd.Timestamp('2020-11-30 06:00:00'), 3: pd.Timestamp('2020-11-30 16:23:00'), 4: pd.Timestamp('2020-11-30 07:35:00'), 5: pd.Timestamp('2020-11-30 08:00:00'), 6: pd.Timestamp('2020-11-30 08:50:00'), 7: pd.Timestamp('2020-11-30 13:45:00'), 8: pd.Timestamp('2020-11-30 10:15:00'), 9: pd.Timestamp('2020-11-30 20:00:00')}, 'ORIGIN_TZ': {0: 'America/New_York', 1: 'America/New_York', 2: 'America/Denver', 3: 'America/New_York', 4: 'America/Chicago', 5: 'America/Chicago', 6: 'America/Los_Angeles', 7: 'America/Chicago', 8: 'America/New_York', 9: 'America/Los_Angeles'}})

一旦完成:

df.loc[mask_tz,'SCHEDULED_DEPARTURE'] = df.loc[mask_tz,'SCHEDULED_DEPARTURE'].dt.tz_localize(tz)

您的列变为对象dtype,下一次.dt访问失败。尝试复制:

s = df['SCHEDULED_DEPARTURE'].copy()
for tz in df['ORIGIN_TZ'].unique():
mask_tz = (df['ORIGIN_TZ'] == tz)
df.loc[mask_tz,'SCHEDULED_DEPARTURE'] = s.loc[mask_tz].dt.tz_localize(tz)

df.loc[0,'SCHEDULED_DEPARTURE']将给出:

Timestamp('2020-11-30 10:15:00-0500', tz='America/New_York')

不过,您的SCHEDULED_DEPARTURE列仍然是object数据类型。

最新更新