我在vaex中有一个数据帧,我在时间戳格式方面遇到了麻烦。
我似乎不能正确格式化时间戳列。在研究了这个问题之后,我得出的结论是,我需要删除UTC偏移量(00:00)中的冒号。但是我不知道该怎么做。
- 注意:我不知道如何制作一个示例vaex数据框架,所以我制作了一个pandas数据框架并将其转换为
from datetime import datetime
# Sample pandas dataframe
df = pd.DataFrame({'quantity' : [0,6,4,3,7,8,3,2],
'timestamp' : ['2018-05-01 03:05:00+00:00', '2018-05-01 04:15:00+00:00',
'2018-06-02 03:15:00+00:00', '2018-06-02 04:25:00+00:00',
'2018-07-03 03:25:00+00:00', '2018-07-03 04:35:00+00:00',
'2018-08-04 03:35:00+00:00', '2018-08-04 04:45:00+00:00'],
'id' : [1,2,3,4,5,6,7,8]
})
# Pandas to vaex
df = vx.from_pandas(df=df, copy_index=True)
# converting string timestamp to datetime
date_format = "%Y-%m-%d %H:%M:%S %z"
def column_to_datetime(datetime_str):
return np.datetime64(datetime.strptime(datetime_str, date_format))
df['timestamp_parsed'] = df['timestamp'].apply(column_to_datetime)
但之后,我得到了错误ValueError: time data '2018-05-01 03:05:00+00:00' does not match format '%Y-%m-%d %H:%M:%S %z'
可以使用:
from datetime import datetime
# Sample pandas dataframe
df = pd.DataFrame({'quantity' : [0,6,4,3,7,8,3,2],
'timestamp' : ['2018-05-01 03:05:00+00:00', '2018-05-01 04:15:00+00:00',
'2018-06-02 03:15:00+00:00', '2018-06-02 04:25:00+00:00',
'2018-07-03 03:25:00+00:00', '2018-07-03 04:35:00+00:00',
'2018-08-04 03:35:00+00:00', '2018-08-04 04:45:00+00:00'],
'id' : [1,2,3,4,5,6,7,8]
})
df['timestamp']=pd.to_datetime(df['timestamp']).dt.tz_localize(None)
# Pandas to vaex
df = vx.from_pandas(df=df, copy_index=True)
:
timestamp
0 2018-05-01 03:05:00
1 2018-05-01 04:15:00
2 2018-06-02 03:15:00
3 2018-06-02 04:25:00
4 2018-07-03 03:25:00
5 2018-07-03 04:35:00
6 2018-08-04 03:35:00
7 2018-08-04 04:45:00
在将字符串转换为datetime之前,请使用代码尝试以下操作:
# Remove the colon from the UTC offset
df['timestamp'] = df['timestamp'].str.replace(':', '')