熊猫加载文本文件错误:CParser错误:标记数据时出错



Pandas 加载文本文件错误: CParserError: 标记数据时出错。

我是熊猫的新学习者。我正在尝试使用熊猫打开文本文件。我用python编写代码,然后访问正确的路径并运行python文件,但失败了。

这是原始数据。没有字段名称,所有数据行都用空格分隔:

2017-07-02 23:59:51127.0.0.1 GET /ecvv_product/EcvvSearchProduct.aspx cid=202104&p=&pageindex=&kw=electric-skateboard 8082 - 127.0.0.1 Mozilla/4.0+(compatible;+MSIE+5.0;+Windows+NT;+DigExt;+DTS+Agent - 200 0 0 986 31.7.188.55
2017-07-02 23:59:51 127.0.0.1 GET /ecvv_product/EcvvHotSearchProduct.aspx kw=hydrogen-motor 8082 - 127.0.0.1 Mozilla/4.0+(compatible;+MSIE+5.0;+Windows+NT;+DigExt;+DTS+Agent - 200 0 0 2539 31.7.188.55
2017-07-02 23:59:51 127.0.0.1 GET /ecvv_product/EcvvSearchProduct.aspx cid=100005713&p=&pageindex=&kw=electric-skateboard 8082 - 127.0.0.1 Mozilla/4.0+(compatible;+MSIE+5.0;+Windows+NT;+DigExt;+DTS+Agent - 200 0 0 1172 31.7.188.55
2017-07-02 23:59:51 127.0.0.1 GET /ecvv_product/EcvvHotSearchProduct.aspx kw=stainless-stand 8082 - 127.0.0.1 Mozilla/4.0+(compatible;+MSIE+5.0;+Windows+NT;+DigExt;+DTS+Agent - 200 0 0 3152 31.7.188.55

这是我的简单python代码:

import pandas as pd
DATA_FILE='data.log'
df = pd.read_table(DATA_FILE, sep=" ")
print(df)

但是我得到的错误如下:

Traceback (most recent call last):
File "open.py", line 7, in <module>
df = pd.read_table(DATA_FILE, sep=" ")
File "C:UsershhAnaconda3libsite-packagespandasioparsers.py", line 646, in parser_f
return _read(filepath_or_buffer, kwds)
File "C:UsershhAnaconda3libsite-packagespandasioparsers.py", line 401, in _read
data = parser.read()
File "C:UsershhAnaconda3libsite-packagespandasioparsers.py", line 939, in read
ret = self._engine.read(nrows)
File "C:UsershhAnaconda3libsite-packagespandasioparsers.py", line 1508, in read
data = self._reader.read(nrows)
File "pandasparser.pyx", line 848, in pandas.parser.TextReader.read (pandasparser.c:10415)
File "pandasparser.pyx", line 870, in pandas.parser.TextReader._read_low_memory (pandasparser.c:10691)
File "pandasparser.pyx", line 924, in pandas.parser.TextReader._read_rows (pandasparser.c:11437)
File "pandasparser.pyx", line 911, in pandas.parser.TextReader._tokenize_rows (pandasparser.c:11308)
File "pandasparser.pyx", line 2024, in pandas.parser.raise_parser_error (pandasparser.c:27037)
pandas.io.common.CParserError: Error tokenizing data. C error: Expected 6 fields in line 4, saw 17

我的 python 代码必须运行一些东西。如何获得正确的语法代码?

您错过了第一行的空格:

2017-07-02 23:59:51127.0.0.1 

替换为:

2017-07-02 23:59:51 127.0.0.1 

刚刚测试:

In [12]: cat data.log
2017-07-02 23:59:51 127.0.0.1 GET /ecvv_product/EcvvSearchProduct.aspx cid=202104&p=&pageindex=&kw=electric-skateboard 8082 - 127.0.0.1 Mozilla/4.0+(compatible;+MSIE+5.0;+Windows+NT;+DigExt;+DTS+Agent - 200 0 0 986 31.7.188.55
2017-07-02 23:59:51 127.0.0.1 GET /ecvv_product/EcvvHotSearchProduct.aspx kw=hydrogen-motor 8082 - 127.0.0.1 Mozilla/4.0+(compatible;+MSIE+5.0;+Windows+NT;+DigExt;+DTS+Agent - 200 0 0 2539 31.7.188.55
2017-07-02 23:59:51 127.0.0.1 GET /ecvv_product/EcvvSearchProduct.aspx cid=100005713&p=&pageindex=&kw=electric-skateboard 8082 - 127.0.0.1 Mozilla/4.0+(compatible;+MSIE+5.0;+Windows+NT;+DigExt;+DTS+Agent - 200 0 0 1172 31.7.188.55
2017-07-02 23:59:51 127.0.0.1 GET /ecvv_product/EcvvHotSearchProduct.aspx kw=stainless-stand 8082 - 127.0.0.1 Mozilla/4.0+(compatible;+MSIE+5.0;+Windows+NT;+DigExt;+DTS+Agent - 200 0 0 3152 31.7.188.55
In [13]: dx = pd.read_table('data.log', sep=" ", header=None)
In [14]: dx
Out[14]: 
0         1          2    3   
0  2017-07-02  23:59:51  127.0.0.1  GET   
1  2017-07-02  23:59:51  127.0.0.1  GET   
2  2017-07-02  23:59:51  127.0.0.1  GET   
3  2017-07-02  23:59:51  127.0.0.1  GET   
4   
0     /ecvv_product/EcvvSearchProduct.aspx   
1  /ecvv_product/EcvvHotSearchProduct.aspx   
2     /ecvv_product/EcvvSearchProduct.aspx   
3  /ecvv_product/EcvvHotSearchProduct.aspx   
5     6  7          8   
0    cid=202104&p=&pageindex=&kw=electric-skateboard  8082  -  127.0.0.1   
1                                  kw=hydrogen-motor  8082  -  127.0.0.1   
2  cid=100005713&p=&pageindex=&kw=electric-skateb...  8082  -  127.0.0.1   
3                                 kw=stainless-stand  8082  -  127.0.0.1   
9  10   11  12  13    14  
0  Mozilla/4.0+(compatible;+MSIE+5.0;+Windows+NT;...  -  200   0   0   986   
1  Mozilla/4.0+(compatible;+MSIE+5.0;+Windows+NT;...  -  200   0   0  2539   
2  Mozilla/4.0+(compatible;+MSIE+5.0;+Windows+NT;...  -  200   0   0  1172   
3  Mozilla/4.0+(compatible;+MSIE+5.0;+Windows+NT;...  -  200   0   0  3152   
15  
0  31.7.188.55  
1  31.7.188.55  
2  31.7.188.55  
3  31.7.188.55  

相关内容

  • 没有找到相关文章

最新更新