pandas.errors.ParserError:使用之前没有错误的数据对数据进行标记时出错 &g



我正在尝试解决pandas.errors.ParserError: Error tokenizing data问题。

我有两种类型的数据

我使用相同的代码,但它不与一种类型的数据工作,我下面附上。(它与另一个工作很好)

(msnoise) [sujan@node01 MSNoise_test2]$ msnoise plot dvv
Traceback (most recent call last):
File "/home/sujan/anaconda3/envs/msnoise/bin/msnoise", line 8, in <module>
sys.exit(run())
File "/home/sujan/anaconda3/envs/msnoise/lib/python2.7/site-packages/msnoise/scripts/msnoise.py", line 1202, in run
cli(obj={})
File "/home/sujan/anaconda3/envs/msnoise/lib/python2.7/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/home/sujan/anaconda3/envs/msnoise/lib/python2.7/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/home/sujan/anaconda3/envs/msnoise/lib/python2.7/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/sujan/anaconda3/envs/msnoise/lib/python2.7/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/sujan/anaconda3/envs/msnoise/lib/python2.7/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/sujan/anaconda3/envs/msnoise/lib/python2.7/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/home/sujan/anaconda3/envs/msnoise/lib/python2.7/site-packages/click/decorators.py", line 21, in new_func
return f(get_current_context(), *args, **kwargs)
File "/home/sujan/anaconda3/envs/msnoise/lib/python2.7/site-packages/msnoise/scripts/msnoise.py", line 943, in dvv
main(mov_stack, dttname, comp, filterid, pair, all, show, outfile)
File "/home/sujan/anaconda3/envs/msnoise/lib/python2.7/site-packages/msnoise/plots/dvv.py", line 89, in main
df = pd.read_csv(day,sep=",", header=0, index_col=0, parse_dates=True)
File "/home/sujan/.local/lib/python2.7/site-packages/pandas/io/parsers.py", line 709, in parser_f
return _read(filepath_or_buffer, kwds)
File "/home/sujan/.local/lib/python2.7/site-packages/pandas/io/parsers.py", line 455, in _read
data = parser.read(nrows)
File "/home/sujan/.local/lib/python2.7/site-packages/pandas/io/parsers.py", line 1069, in read
ret = self._engine.read(nrows)
File "/home/sujan/.local/lib/python2.7/site-packages/pandas/io/parsers.py", line 1839, in read
data = self._reader.read(nrows)
File "pandas/_libs/parsers.pyx", line 902, in pandas._libs.parsers.TextReader.read
File "pandas/_libs/parsers.pyx", line 924, in pandas._libs.parsers.TextReader._read_low_memory
File "pandas/_libs/parsers.pyx", line 978, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 965, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas/_libs/parsers.pyx", line 2208, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 8 fields in line 114, saw 15

我添加了, error_bad_lines=False,但它没有帮助,并显示如下错误。

(msnoise) [sujan@node01 MSNoise_test2]$ msnoise plot dvv
Skipping line 114: expected 8 fields, saw 15
(1,                             A        EA        EM       EM0         M  
Date
2013-09-29 00:00:00 -0.076348       inf       inf  0.000501 -0.002737
2013-09-29 00:00:00  0.014844  0.021573  0.001400  0.001239  0.000257
2013-09-29 00:00:00 -0.071597  0.002802  0.000144  0.001724 -0.000043
2013-09-29 00:00:00 -0.047929       inf       inf  0.002285  0.001605
2013-09-29 00:00:00 -0.135391       inf       inf  0.002244  0.011393
M0            Pairs
Date
2013-09-29 00:00:00  0.000836  05_TP01_05_TP10
2013-09-29 00:00:00  0.000558  05_TP02_05_TP10
2013-09-29 00:00:00  0.002713  05_TP09_05_TP10
2013-09-29 00:00:00  0.008074  05_TP01_05_TP09
2013-09-29 00:00:00  0.000346  05_TP02_05_TP09  )
Traceback (most recent call last):
File "/home/sujan/anaconda3/envs/msnoise/bin/msnoise", line 8, in <module>
sys.exit(run())
File "/home/sujan/anaconda3/envs/msnoise/lib/python2.7/site-packages/msnoise/scripts/msnoise.py", line 1202, in run
cli(obj={})
File "/home/sujan/anaconda3/envs/msnoise/lib/python2.7/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/home/sujan/anaconda3/envs/msnoise/lib/python2.7/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/home/sujan/anaconda3/envs/msnoise/lib/python2.7/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/sujan/anaconda3/envs/msnoise/lib/python2.7/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/sujan/anaconda3/envs/msnoise/lib/python2.7/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/sujan/anaconda3/envs/msnoise/lib/python2.7/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/home/sujan/anaconda3/envs/msnoise/lib/python2.7/site-packages/click/decorators.py", line 21, in new_func
return f(get_current_context(), *args, **kwargs)
File "/home/sujan/anaconda3/envs/msnoise/lib/python2.7/site-packages/msnoise/scripts/msnoise.py", line 943, in dvv
main(mov_stack, dttname, comp, filterid, pair, all, show, outfile)
File "/home/sujan/anaconda3/envs/msnoise/lib/python2.7/site-packages/msnoise/plots/dvv.py", line 140, in main
tmp2 = allbut[dttname].resample('D').mean()
File "/home/sujan/.local/lib/python2.7/site-packages/pandas/core/generic.py", line 5522, in resample
base=base, key=on, level=level)
File "/home/sujan/.local/lib/python2.7/site-packages/pandas/core/resample.py", line 999, in resample
return tg._get_resampler(obj, kind=kind)
File "/home/sujan/.local/lib/python2.7/site-packages/pandas/core/resample.py", line 1096, in _get_resampler
self._set_grouper(obj)
File "/home/sujan/.local/lib/python2.7/site-packages/pandas/core/groupby.py", line 439, in _set_grouper
indexer = self.indexer = ax.argsort(kind='mergesort')
File "/home/sujan/.local/lib/python2.7/site-packages/pandas/core/indexes/base.py", line 2151, in argsort
return result.argsort(*args, **kwargs)
File "pandas/_libs/tslib.pyx", line 1165, in pandas._libs.tslib._Timestamp.__richcmp__
TypeError: Cannot compare type 'Timestamp' with type 'str'

然而,有问题的数据直到两周前都工作得很好,但突然显示了解析错误。

我甚至没有碰过任何数据或结果。

另外,我认为产生问题的代码如下:

for i, mov_stack in enumerate(mov_stacks):
current = start
first = True
alldf = []
while current <= end:
for comp in components:
day = os.path.join('DTT', "%02i" % filterid, "%03i_DAYS" % mov_stack, comp, '%s.txt' % current)
if os.path.isfile(day):
df = pd.read_csv(day, header=0, index_col=0, parse_dates=True)
alldf.append(df)
current += datetime.timedelta(days=1)
if len(alldf) == 0:
print("No Data for %s m%i f%i" % (components, mov_stack, filterid))
continue

代码day = os.path.join('DTT', "%02i" % filterid, "%03i_DAYS" % mov_stack, comp, '%s.txt' % current)读取如下文本文件

Date,A,EA,EM,EM0,M,M0,Pairs
2014-05-10,0.419549372718,inf,inf,0.000458496085412,-0.0160997929491,0.000732900920237,05_SS08_05_TP01
2014-05-10,-0.0429633365955,inf,inf,0.000525405329004,0.000306985380522,0.00237631297525,05_TP01_05_TP07
2014-05-10,0.067236405269,inf,inf,0.00256763292024,-0.000489522024887,0.000310750516333,05_SS08_05_TP10
2014-05-10,-0.0286482054004,inf,inf,0.00101017717763,-0.00188012718704,-0.00148293566406,05_SS02_05_SS05

但是没有问题的数据是相同的txt文件格式,没有问题。所以奇怪。

它让我的工作都停止了。所以,如果你知道我必须做什么或需要其他信息来解决这个问题,请告诉我。

我找到解决方案了。原因是环境变量。我在那里添加了python路径来解决parsererror之前发生的no module问题。但这不是解决no module问题,而是编辑bashrc。无论如何,当我删除环境变量中的python路径并执行所有步骤(cc, mwcs等)时,msnoise plot dvv最终工作得很好。

最新更新