以保存时间戳索引类型的方式保存pandas数据框(现有答案不起作用?)

（我已经阅读了此答案，但似乎不再起作用）

考虑例如example_1.csv：

timestamp,temp
21-Jun-2017 00:36:49.539000,22
21-Jun-2017 00:36:49.633000,22
21-Jun-2017 00:36:49.633000,22
21-Jun-2017 00:37:03.055000,42
21-Jun-2017 00:37:03.164000,22
21-Jun-2017 00:37:03.164000,22
21-Jun-2017 00:37:12.680000,42
21-Jun-2017 00:37:22.664000,42
21-Jun-2017 00:37:22.664000,42
21-Jun-2017 00:37:22.758000,42
21-Jun-2017 00:37:22.758000,42

好的。在Python 3.5中：

import pandas
>>> pandas.__version__
'0.21.0'

example_df  = pandas.read_csv('example_1.csv', index_col = 0)
example_df.index = pandas.to_datetime(example_df.index, format = '%d-%b-%Y %H:%M:%S.%f')

到目前为止，很好：

>>> example_df.index
DatetimeIndex(['2017-06-21 00:36:49.539000', '2017-06-21 00:36:49.633000',
               '2017-06-21 00:36:49.633000', '2017-06-21 00:37:03.055000',
               '2017-06-21 00:37:03.164000', '2017-06-21 00:37:03.164000',
               '2017-06-21 00:37:12.680000', '2017-06-21 00:37:22.664000',
               '2017-06-21 00:37:22.664000', '2017-06-21 00:37:22.758000',
               '2017-06-21 00:37:22.758000'],
              dtype='datetime64[ns]', name='timestamp', freq=None)

但是，我必须保存一切：

example_df.to_csv('example_2.csv', date_format = '%d-%b-%Y %H:%M:%S.%f')
example_df_2  = pandas.read_csv('example_2.csv', index_col = 0)

但是，当我在example_df_2的索引中阅读时，它并不是一个被视为 datetime64[ns]：

>>> example_df_2.index
Index(['21-Jun-2017 00:36:49.539000', '21-Jun-2017 00:36:49.633000',
       '21-Jun-2017 00:36:49.633000', '21-Jun-2017 00:37:03.055000',
       '21-Jun-2017 00:37:03.164000', '21-Jun-2017 00:37:03.164000',
       '21-Jun-2017 00:37:12.680000', '21-Jun-2017 00:37:22.664000',
       '21-Jun-2017 00:37:22.664000', '21-Jun-2017 00:37:22.758000',
       '21-Jun-2017 00:37:22.758000'],
      dtype='object', name='timestamp')

这也无济于事：

>>> example_df_2.index.astype('datetime64[ns]')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/indexes/base.py", line 1059, in astype
    return Index(self.values.astype(dtype, copy=copy), name=self.name,
ValueError: Error parsing datetime string "21-Jun-2017 00:37:22.758000" at position 3

现在，这个文件是Yuge，我必须以一种格式保存它，可以快速阅读后者。如果我可以避免的话，我宁愿不必两次解析日期时间戳。

那么如何解决这个问题？（或IMMA做错了什么？）

使用parse_dates参数：

example_df_2  = pd.read_csv('example_2.csv', index_col = 0, parse_dates=True)
example_df_2.index

输出：

DatetimeIndex(['2017-06-21 00:36:49.539000', '2017-06-21 00:36:49.633000',
               '2017-06-21 00:36:49.633000', '2017-06-21 00:37:03.055000',
               '2017-06-21 00:37:03.164000', '2017-06-21 00:37:03.164000',
               '2017-06-21 00:37:12.680000', '2017-06-21 00:37:22.664000',
               '2017-06-21 00:37:22.664000', '2017-06-21 00:37:22.758000',
               '2017-06-21 00:37:22.758000'],
              dtype='datetime64[ns]', name='timestamp', freq=None)

注意，您也可以为第一个导入而执行此操作：

example_df = pd.read_csv('example_1.csv', index_col=0, parse_dates=True)
example_df.index

输出：

DatetimeIndex(['2017-06-21 00:36:49.539000', '2017-06-21 00:36:49.633000',
               '2017-06-21 00:36:49.633000', '2017-06-21 00:37:03.055000',
               '2017-06-21 00:37:03.164000', '2017-06-21 00:37:03.164000',
               '2017-06-21 00:37:12.680000', '2017-06-21 00:37:22.664000',
               '2017-06-21 00:37:22.664000', '2017-06-21 00:37:22.758000',
               '2017-06-21 00:37:22.758000'],
              dtype='datetime64[ns]', name='timestamp', freq=None)

相关内容

最新更新

热门标签：