熊猫中重复列的有趣结果.DataFrame

当pandas.DataFrame中有重复列时，有人能帮助解释为什么我在某些操作中会出错，而在其他操作中不会出错吗。

最小、可复制示例

import pandas as pd
df = pd.DataFrame(columns=['a', 'b', 'b'])

如果我尝试在column 'a'中插入一个列表，我会得到一个关于维度不匹配的错误：

df.loc[:, 'a'] = list(range(5))
Traceback (most recent call last):
...
ValueError: cannot copy sequence with size 5 to array axis with dimension 0

类似于'b':

df.loc[:, 'b'] = list(range(5))
Traceback (most recent call last):
...
ValueError: could not broadcast input array from shape (5) into shape (0,2)

但是，如果我插入一个全新的列，我不会得到错误，除非我插入'a'或'b':

df.loc[:, 'c'] = list(range(5))
print(df)
a    b    b  c
0  NaN  NaN  NaN  0
1  NaN  NaN  NaN  1
2  NaN  NaN  NaN  2
3  NaN  NaN  NaN  3
4  NaN  NaN  NaN  4
df.loc[:, 'a'] = list(range(5))
Traceback (most recent call last):
...
ValueError: Buffer has wrong number of dimensions (expected 1, got 0)

如果我删除重复的列'b'，所有这些错误都会消失

附加信息

pandas==1.0.2

为什么使用loc而不仅仅是：

df['a'] = list(range(5))

这不会产生错误，似乎可以产生您需要的东西：

a   b   b
0   NaN NaN 
1   NaN NaN 
2   NaN NaN 
3   NaN NaN 
4   NaN NaN

与创建列c:相同

df['c'] = list(range(5))

相关内容

最新更新

热门标签：