我有一个3'502'379行和3列的文件。应该执行以下脚本,但在日期处理行中引发错误:
import matplotlib.pyplot as plt
import numpy as np
import csv
import pandas
path = 'data_prices.csv'
data = pandas.read_csv(path, sep=';')
data['DATE'] = pandas.to_datetime(data['DATE'], format='%Y%m%d')
这是发生的错误:
Traceback (most recent call last):
File "C:Program FilesPython35libsite-packagespandasindexesbase.py", line 1945, in get_loc
return self._engine.get_loc(key)
File "pandasindex.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandasindex.c:4066)
File "pandasindex.pyx", line 159, in pandas.index.IndexEngine.get_loc (pandasindex.c:3930)
File "pandashashtable.pyx", line 675, in pandas.hashtable.PyObjectHashTable.get_item (pandashashtable.c:12408)
File "pandashashtable.pyx", line 683, in pandas.hashtable.PyObjectHashTable.get_item (pandashashtable.c:12359)
KeyError: 'DATE'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:datascript.py", line 15, in <module>
data['DATE'] = pandas.to_datetime(data['DATE'], format='%Y%m%d')
File "C:Program FilesPython35libsite-packagespandascoreframe.py", line 1997, in __getitem__
return self._getitem_column(key)
File "C:Program FilesPython35libsite-packagespandascoreframe.py", line 2004, in _getitem_column
return self._get_item_cache(key)
File "C:Program FilesPython35libsite-packagespandascoregeneric.py", line 1350, in _get_item_cache
values = self._data.get(item)
File "C:Program FilesPython35libsite-packagespandascoreinternals.py", line 3290, in get
loc = self.items.get_loc(item)
File "C:Program FilesPython35libsite-packagespandasindexesbase.py", line 1947, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandasindex.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandasindex.c:4066)
File "pandasindex.pyx", line 159, in pandas.index.IndexEngine.get_loc (pandasindex.c:3930)
File "pandashashtable.pyx", line 675, in pandas.hashtable.PyObjectHashTable.get_item (pandashashtable.c:12408)
File "pandashashtable.pyx", line 683, in pandas.hashtable.PyObjectHashTable.get_item (pandashashtable.c:12359)
KeyError: 'DATE'
第一个列名中的'ufeffDATE'
表示您的CSV文件具有UTF-16字节顺序标记(BOM)签名,因此必须相应地读取。
所以在阅读CSV文件时试试这个:
df = pd.read_csv(path, sep=';', encoding='utf-8-sig')
或者像@EdChum建议的那样:
df = pd.read_csv(path, sep=';', encoding='utf-16')
两个变量应该都能正常工作
PS这个答案展示了如何处理bom