(我已经阅读了此答案,但似乎不再起作用)
考虑例如example_1.csv
:
timestamp,temp
21-Jun-2017 00:36:49.539000,22
21-Jun-2017 00:36:49.633000,22
21-Jun-2017 00:36:49.633000,22
21-Jun-2017 00:37:03.055000,42
21-Jun-2017 00:37:03.164000,22
21-Jun-2017 00:37:03.164000,22
21-Jun-2017 00:37:12.680000,42
21-Jun-2017 00:37:22.664000,42
21-Jun-2017 00:37:22.664000,42
21-Jun-2017 00:37:22.758000,42
21-Jun-2017 00:37:22.758000,42
好的。在Python 3.5中:
import pandas
>>> pandas.__version__
'0.21.0'
example_df = pandas.read_csv('example_1.csv', index_col = 0)
example_df.index = pandas.to_datetime(example_df.index, format = '%d-%b-%Y %H:%M:%S.%f')
到目前为止,很好:
>>> example_df.index
DatetimeIndex(['2017-06-21 00:36:49.539000', '2017-06-21 00:36:49.633000',
'2017-06-21 00:36:49.633000', '2017-06-21 00:37:03.055000',
'2017-06-21 00:37:03.164000', '2017-06-21 00:37:03.164000',
'2017-06-21 00:37:12.680000', '2017-06-21 00:37:22.664000',
'2017-06-21 00:37:22.664000', '2017-06-21 00:37:22.758000',
'2017-06-21 00:37:22.758000'],
dtype='datetime64[ns]', name='timestamp', freq=None)
但是,我必须保存一切:
example_df.to_csv('example_2.csv', date_format = '%d-%b-%Y %H:%M:%S.%f')
example_df_2 = pandas.read_csv('example_2.csv', index_col = 0)
但是,当我在example_df_2
的索引中阅读时,它并不是一个被视为 datetime64[ns]
:
>>> example_df_2.index
Index(['21-Jun-2017 00:36:49.539000', '21-Jun-2017 00:36:49.633000',
'21-Jun-2017 00:36:49.633000', '21-Jun-2017 00:37:03.055000',
'21-Jun-2017 00:37:03.164000', '21-Jun-2017 00:37:03.164000',
'21-Jun-2017 00:37:12.680000', '21-Jun-2017 00:37:22.664000',
'21-Jun-2017 00:37:22.664000', '21-Jun-2017 00:37:22.758000',
'21-Jun-2017 00:37:22.758000'],
dtype='object', name='timestamp')
这也无济于事:
>>> example_df_2.index.astype('datetime64[ns]')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.5/dist-packages/pandas/core/indexes/base.py", line 1059, in astype
return Index(self.values.astype(dtype, copy=copy), name=self.name,
ValueError: Error parsing datetime string "21-Jun-2017 00:37:22.758000" at position 3
现在,这个文件是Yuge,我必须以一种格式保存它,可以快速阅读后者。如果我可以避免的话,我宁愿不必两次解析日期时间戳。
那么如何解决这个问题?(或IMMA做错了什么?)
使用parse_dates
参数:
example_df_2 = pd.read_csv('example_2.csv', index_col = 0, parse_dates=True)
example_df_2.index
输出:
DatetimeIndex(['2017-06-21 00:36:49.539000', '2017-06-21 00:36:49.633000',
'2017-06-21 00:36:49.633000', '2017-06-21 00:37:03.055000',
'2017-06-21 00:37:03.164000', '2017-06-21 00:37:03.164000',
'2017-06-21 00:37:12.680000', '2017-06-21 00:37:22.664000',
'2017-06-21 00:37:22.664000', '2017-06-21 00:37:22.758000',
'2017-06-21 00:37:22.758000'],
dtype='datetime64[ns]', name='timestamp', freq=None)
注意,您也可以为第一个导入而执行此操作:
example_df = pd.read_csv('example_1.csv', index_col=0, parse_dates=True)
example_df.index
输出:
DatetimeIndex(['2017-06-21 00:36:49.539000', '2017-06-21 00:36:49.633000',
'2017-06-21 00:36:49.633000', '2017-06-21 00:37:03.055000',
'2017-06-21 00:37:03.164000', '2017-06-21 00:37:03.164000',
'2017-06-21 00:37:12.680000', '2017-06-21 00:37:22.664000',
'2017-06-21 00:37:22.664000', '2017-06-21 00:37:22.758000',
'2017-06-21 00:37:22.758000'],
dtype='datetime64[ns]', name='timestamp', freq=None)