Pandas 加载文本文件错误: CParserError: 标记数据时出错。
我是熊猫的新学习者。我正在尝试使用熊猫打开文本文件。我用python编写代码,然后访问正确的路径并运行python文件,但失败了。
这是原始数据。没有字段名称,所有数据行都用空格分隔:
2017-07-02 23:59:51127.0.0.1 GET /ecvv_product/EcvvSearchProduct.aspx cid=202104&p=&pageindex=&kw=electric-skateboard 8082 - 127.0.0.1 Mozilla/4.0+(compatible;+MSIE+5.0;+Windows+NT;+DigExt;+DTS+Agent - 200 0 0 986 31.7.188.55
2017-07-02 23:59:51 127.0.0.1 GET /ecvv_product/EcvvHotSearchProduct.aspx kw=hydrogen-motor 8082 - 127.0.0.1 Mozilla/4.0+(compatible;+MSIE+5.0;+Windows+NT;+DigExt;+DTS+Agent - 200 0 0 2539 31.7.188.55
2017-07-02 23:59:51 127.0.0.1 GET /ecvv_product/EcvvSearchProduct.aspx cid=100005713&p=&pageindex=&kw=electric-skateboard 8082 - 127.0.0.1 Mozilla/4.0+(compatible;+MSIE+5.0;+Windows+NT;+DigExt;+DTS+Agent - 200 0 0 1172 31.7.188.55
2017-07-02 23:59:51 127.0.0.1 GET /ecvv_product/EcvvHotSearchProduct.aspx kw=stainless-stand 8082 - 127.0.0.1 Mozilla/4.0+(compatible;+MSIE+5.0;+Windows+NT;+DigExt;+DTS+Agent - 200 0 0 3152 31.7.188.55
这是我的简单python代码:
import pandas as pd
DATA_FILE='data.log'
df = pd.read_table(DATA_FILE, sep=" ")
print(df)
但是我得到的错误如下:
Traceback (most recent call last):
File "open.py", line 7, in <module>
df = pd.read_table(DATA_FILE, sep=" ")
File "C:UsershhAnaconda3libsite-packagespandasioparsers.py", line 646, in parser_f
return _read(filepath_or_buffer, kwds)
File "C:UsershhAnaconda3libsite-packagespandasioparsers.py", line 401, in _read
data = parser.read()
File "C:UsershhAnaconda3libsite-packagespandasioparsers.py", line 939, in read
ret = self._engine.read(nrows)
File "C:UsershhAnaconda3libsite-packagespandasioparsers.py", line 1508, in read
data = self._reader.read(nrows)
File "pandasparser.pyx", line 848, in pandas.parser.TextReader.read (pandasparser.c:10415)
File "pandasparser.pyx", line 870, in pandas.parser.TextReader._read_low_memory (pandasparser.c:10691)
File "pandasparser.pyx", line 924, in pandas.parser.TextReader._read_rows (pandasparser.c:11437)
File "pandasparser.pyx", line 911, in pandas.parser.TextReader._tokenize_rows (pandasparser.c:11308)
File "pandasparser.pyx", line 2024, in pandas.parser.raise_parser_error (pandasparser.c:27037)
pandas.io.common.CParserError: Error tokenizing data. C error: Expected 6 fields in line 4, saw 17
我的 python 代码必须运行一些东西。如何获得正确的语法代码?
您错过了第一行的空格:
2017-07-02 23:59:51127.0.0.1
替换为:
2017-07-02 23:59:51 127.0.0.1
刚刚测试:
In [12]: cat data.log
2017-07-02 23:59:51 127.0.0.1 GET /ecvv_product/EcvvSearchProduct.aspx cid=202104&p=&pageindex=&kw=electric-skateboard 8082 - 127.0.0.1 Mozilla/4.0+(compatible;+MSIE+5.0;+Windows+NT;+DigExt;+DTS+Agent - 200 0 0 986 31.7.188.55
2017-07-02 23:59:51 127.0.0.1 GET /ecvv_product/EcvvHotSearchProduct.aspx kw=hydrogen-motor 8082 - 127.0.0.1 Mozilla/4.0+(compatible;+MSIE+5.0;+Windows+NT;+DigExt;+DTS+Agent - 200 0 0 2539 31.7.188.55
2017-07-02 23:59:51 127.0.0.1 GET /ecvv_product/EcvvSearchProduct.aspx cid=100005713&p=&pageindex=&kw=electric-skateboard 8082 - 127.0.0.1 Mozilla/4.0+(compatible;+MSIE+5.0;+Windows+NT;+DigExt;+DTS+Agent - 200 0 0 1172 31.7.188.55
2017-07-02 23:59:51 127.0.0.1 GET /ecvv_product/EcvvHotSearchProduct.aspx kw=stainless-stand 8082 - 127.0.0.1 Mozilla/4.0+(compatible;+MSIE+5.0;+Windows+NT;+DigExt;+DTS+Agent - 200 0 0 3152 31.7.188.55
In [13]: dx = pd.read_table('data.log', sep=" ", header=None)
In [14]: dx
Out[14]:
0 1 2 3
0 2017-07-02 23:59:51 127.0.0.1 GET
1 2017-07-02 23:59:51 127.0.0.1 GET
2 2017-07-02 23:59:51 127.0.0.1 GET
3 2017-07-02 23:59:51 127.0.0.1 GET
4
0 /ecvv_product/EcvvSearchProduct.aspx
1 /ecvv_product/EcvvHotSearchProduct.aspx
2 /ecvv_product/EcvvSearchProduct.aspx
3 /ecvv_product/EcvvHotSearchProduct.aspx
5 6 7 8
0 cid=202104&p=&pageindex=&kw=electric-skateboard 8082 - 127.0.0.1
1 kw=hydrogen-motor 8082 - 127.0.0.1
2 cid=100005713&p=&pageindex=&kw=electric-skateb... 8082 - 127.0.0.1
3 kw=stainless-stand 8082 - 127.0.0.1
9 10 11 12 13 14
0 Mozilla/4.0+(compatible;+MSIE+5.0;+Windows+NT;... - 200 0 0 986
1 Mozilla/4.0+(compatible;+MSIE+5.0;+Windows+NT;... - 200 0 0 2539
2 Mozilla/4.0+(compatible;+MSIE+5.0;+Windows+NT;... - 200 0 0 1172
3 Mozilla/4.0+(compatible;+MSIE+5.0;+Windows+NT;... - 200 0 0 3152
15
0 31.7.188.55
1 31.7.188.55
2 31.7.188.55
3 31.7.188.55