以保存时间戳索引类型的方式保存pandas数据框(现有答案不起作用?)



(我已经阅读了此答案,但似乎不再起作用)

考虑例如example_1.csv

timestamp,temp
21-Jun-2017 00:36:49.539000,22
21-Jun-2017 00:36:49.633000,22
21-Jun-2017 00:36:49.633000,22
21-Jun-2017 00:37:03.055000,42
21-Jun-2017 00:37:03.164000,22
21-Jun-2017 00:37:03.164000,22
21-Jun-2017 00:37:12.680000,42
21-Jun-2017 00:37:22.664000,42
21-Jun-2017 00:37:22.664000,42
21-Jun-2017 00:37:22.758000,42
21-Jun-2017 00:37:22.758000,42

好的。在Python 3.5中:

import pandas
>>> pandas.__version__
'0.21.0'

example_df  = pandas.read_csv('example_1.csv', index_col = 0)
example_df.index = pandas.to_datetime(example_df.index, format = '%d-%b-%Y %H:%M:%S.%f')

到目前为止,很好:

>>> example_df.index
DatetimeIndex(['2017-06-21 00:36:49.539000', '2017-06-21 00:36:49.633000',
               '2017-06-21 00:36:49.633000', '2017-06-21 00:37:03.055000',
               '2017-06-21 00:37:03.164000', '2017-06-21 00:37:03.164000',
               '2017-06-21 00:37:12.680000', '2017-06-21 00:37:22.664000',
               '2017-06-21 00:37:22.664000', '2017-06-21 00:37:22.758000',
               '2017-06-21 00:37:22.758000'],
              dtype='datetime64[ns]', name='timestamp', freq=None)

但是,我必须保存一切:

example_df.to_csv('example_2.csv', date_format = '%d-%b-%Y %H:%M:%S.%f')
example_df_2  = pandas.read_csv('example_2.csv', index_col = 0)

但是,当我在example_df_2的索引中阅读时,它并不是一个被视为 datetime64[ns]

>>> example_df_2.index
Index(['21-Jun-2017 00:36:49.539000', '21-Jun-2017 00:36:49.633000',
       '21-Jun-2017 00:36:49.633000', '21-Jun-2017 00:37:03.055000',
       '21-Jun-2017 00:37:03.164000', '21-Jun-2017 00:37:03.164000',
       '21-Jun-2017 00:37:12.680000', '21-Jun-2017 00:37:22.664000',
       '21-Jun-2017 00:37:22.664000', '21-Jun-2017 00:37:22.758000',
       '21-Jun-2017 00:37:22.758000'],
      dtype='object', name='timestamp')

这也无济于事:

>>> example_df_2.index.astype('datetime64[ns]')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/indexes/base.py", line 1059, in astype
    return Index(self.values.astype(dtype, copy=copy), name=self.name,
ValueError: Error parsing datetime string "21-Jun-2017 00:37:22.758000" at position 3

现在,这个文件是Yuge,我必须以一种格式保存它,可以快速阅读后者。如果我可以避免的话,我宁愿不必两次解析日期时间戳。

那么如何解决这个问题?(或IMMA做错了什么?)

使用parse_dates参数:

example_df_2  = pd.read_csv('example_2.csv', index_col = 0, parse_dates=True)
example_df_2.index

输出:

DatetimeIndex(['2017-06-21 00:36:49.539000', '2017-06-21 00:36:49.633000',
               '2017-06-21 00:36:49.633000', '2017-06-21 00:37:03.055000',
               '2017-06-21 00:37:03.164000', '2017-06-21 00:37:03.164000',
               '2017-06-21 00:37:12.680000', '2017-06-21 00:37:22.664000',
               '2017-06-21 00:37:22.664000', '2017-06-21 00:37:22.758000',
               '2017-06-21 00:37:22.758000'],
              dtype='datetime64[ns]', name='timestamp', freq=None)

注意,您也可以为第一个导入而执行此操作:

example_df = pd.read_csv('example_1.csv', index_col=0, parse_dates=True)
example_df.index

输出:

DatetimeIndex(['2017-06-21 00:36:49.539000', '2017-06-21 00:36:49.633000',
               '2017-06-21 00:36:49.633000', '2017-06-21 00:37:03.055000',
               '2017-06-21 00:37:03.164000', '2017-06-21 00:37:03.164000',
               '2017-06-21 00:37:12.680000', '2017-06-21 00:37:22.664000',
               '2017-06-21 00:37:22.664000', '2017-06-21 00:37:22.758000',
               '2017-06-21 00:37:22.758000'],
              dtype='datetime64[ns]', name='timestamp', freq=None)

最新更新