熊猫重采样器.max() 失败并显示"ValueError: Wrong number of items"



我正在尝试使用Pandas来处理和绘制CSV文件中的数据。最初的脚本来自这里,几年前我成功地运行了它。然而,现在,即使使用相同的数据集,它也总是失败,只有一个我不理解的例外。

这是我目前正在使用的代码。我已经将原始文件的第二行拆分到最后一行,以便能够精确定位异常的来源:

#!/usr/bin/env python3
"""Plot bank account balance in CSV-MT940 format based on a starting balance."""
import argparse
import os
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument("filename", help="File to parse")
parser.add_argument("start_balance", help="Balance at beginning of file", type=float)
args = parser.parse_args()
# Import after argument parsing to reduce startup time if help flag passed/invalid arguments
import matplotlib.pyplot as plt
import pandas as pd
verlauf = pd.read_csv(os.path.expanduser(args.filename), sep=";", encoding="ISO-8859-1", decimal=",")
verlauf["Date"] = pd.to_datetime(verlauf.Buchungstag, format="%d.%m.%y")
verlauf = verlauf.reindex(index=verlauf.index[::-1])
verlauf["Kumulativer Umsat"] = verlauf.Betrag.cumsum()
verlauf["Kontostand"] = verlauf["Kumulativer Umsat"] + args.start_balance
verlauf.index = verlauf.Date
step1 = verlauf.resample("D")
step2 = step1.max()  # <- this is where it fails
step3 = step2.interpolate()
step4 = step3.Kontostand
step4.plot()
plt.show()

我能够将输入文件归结为以下内容:

"Buchungstag";"Betrag";"Info"
"08.05.20";"1,00";""
"08.05.20";"1,00";"some info"

我得到的例外看起来是这样的:

Traceback (most recent call last):
File "/home/max/Entwicklung/python/plot_expenses/./plot_expenses.py", line 24, in <module>
step2 = step1.max()  # <- this is where it fails
File "/home/max/Entwicklung/python/plot_expenses/venv/lib/python3.9/site-packages/pandas/core/resample.py", line 957, in f
return self._downsample(_method, min_count=min_count)
File "/home/max/Entwicklung/python/plot_expenses/venv/lib/python3.9/site-packages/pandas/core/resample.py", line 1080, in _downsample
result = obj.groupby(self.grouper, axis=self.axis).aggregate(how, **kwargs)
File "/home/max/Entwicklung/python/plot_expenses/venv/lib/python3.9/site-packages/pandas/core/groupby/generic.py", line 945, in aggregate
result, how = aggregate(self, func, *args, **kwargs)
File "/home/max/Entwicklung/python/plot_expenses/venv/lib/python3.9/site-packages/pandas/core/aggregation.py", line 579, in aggregate
return obj._try_aggregate_string_function(arg, *args, **kwargs), None
File "/home/max/Entwicklung/python/plot_expenses/venv/lib/python3.9/site-packages/pandas/core/base.py", line 315, in _try_aggregate_string_function
return f(*args, **kwargs)
File "/home/max/Entwicklung/python/plot_expenses/venv/lib/python3.9/site-packages/pandas/core/groupby/groupby.py", line 1676, in max
return self._agg_general(
File "/home/max/Entwicklung/python/plot_expenses/venv/lib/python3.9/site-packages/pandas/core/groupby/groupby.py", line 1024, in _agg_general
result = self._cython_agg_general(
File "/home/max/Entwicklung/python/plot_expenses/venv/lib/python3.9/site-packages/pandas/core/groupby/generic.py", line 1015, in _cython_agg_general
agg_mgr = self._cython_agg_blocks(
File "/home/max/Entwicklung/python/plot_expenses/venv/lib/python3.9/site-packages/pandas/core/groupby/generic.py", line 1118, in _cython_agg_blocks
new_mgr = data.apply(blk_func, ignore_failures=True)
File "/home/max/Entwicklung/python/plot_expenses/venv/lib/python3.9/site-packages/pandas/core/internals/managers.py", line 425, in apply
applied = b.apply(f, **kwargs)
File "/home/max/Entwicklung/python/plot_expenses/venv/lib/python3.9/site-packages/pandas/core/internals/blocks.py", line 380, in apply
return self._split_op_result(result)
File "/home/max/Entwicklung/python/plot_expenses/venv/lib/python3.9/site-packages/pandas/core/internals/blocks.py", line 416, in _split_op_result
result = self.make_block(result)
File "/home/max/Entwicklung/python/plot_expenses/venv/lib/python3.9/site-packages/pandas/core/internals/blocks.py", line 286, in make_block
return make_block(values, placement=placement, ndim=self.ndim)
File "/home/max/Entwicklung/python/plot_expenses/venv/lib/python3.9/site-packages/pandas/core/internals/blocks.py", line 2742, in make_block
return klass(values, ndim=ndim, placement=placement)
File "/home/max/Entwicklung/python/plot_expenses/venv/lib/python3.9/site-packages/pandas/core/internals/blocks.py", line 142, in __init__
raise ValueError(
ValueError: Wrong number of items passed 1, placement implies 2

也许这只是我对熊猫的有限经验,但这个错误消息对我来说毫无帮助

我注意到,错误消息中的第一个数字对应于较短行的非空值的数量(在本例中为第一个(减1,而第二个数字对应较长行的非空值的数量减1。行的顺序无关紧要。意味着下面的CSV文件产生消息";通过的项目数量错误2,放置意味着3":

"Buchungstag";"Betrag";"Info";"asdf"
"08.05.20";"1,00";"some info";"a"
"08.05.20";"1,00";"";"a"

如果两行都有相同数量的空值,也不例外。

除此之外,即使在Pandas代码中挖掘了大约两个小时,我也很失落。我真的很感谢比我更有经验的人的帮助。

这更多的是一种变通方法,而不是解决根本原因。

此问题是由于某些列中的na值造成的。您可以通过在read_csv中使用keep_default_na=False来获得空字符串,而不是Info列中的na值来解决此问题:

verlauf = pd.read_csv(os.path.expanduser(args.filename), sep=";", encoding="ISO-8859-1", decimal=",", keep_default_na=False)

最新更新