ValueError:无法从重复的轴pd.concat重新建立索引

我正在尝试连接pandas数据帧：

def extract_articles(data, article_numbers):
result = pd.concat(
[
data[data['ARTICLENO'] == article_no]['QUANTITY']
for article_no in article_numbers
],
axis=1,
).fillna(0)
result.columns = article_numbers
return result

当从csv中读取更多行(约100k(时，我会得到以下错误：ValueError：无法从重复的轴重新索引

以下是我的csv的基本外观：

Date,       ArticleNo, Quantity
2018-07-15, 1005,      150
2018-07-14, 1005,      165
2018-07-12, 1005,      160
2018-07-14, 1008,      230
2018-07-12, 1008,      245

文件按文章编号和日期排序。对于每个日期，都可能有多个"文章编号-数量元组"。在某些文章编号没有数据的情况下，可能会有间隙，它们是0。为什么我会出现这个错误？

我认为存在重复的索引值，您可以更改：

data[data['ARTICLENO'] == article_no]['QUANTITY']

至

(data.loc[data['ARTICLENO'] == article_no, ['QUANTITY']]
.set_index(data.groupby('Date').cumcount(), append=True))

对于CCD_ 1。

但对于您的预期输出，需要聚合sum并通过unstack:进行重塑

df = df.groupby(['Date','ArticleNo'])['Quantity'].sum().unstack(fill_value=0)
print (df)
ArticleNo   1005  1008
Date                  
2018-07-12   160   245
2018-07-14   165   230
2018-07-15   150     0

相关内容

最新更新

热门标签：