熊猫：当我使用唯一索引应用一些 loc 操作时"cannot reindex from duplicate axis"

>我有一个名为 df 的 DataFrama，当我应用df[df>10]时，我得到"无法从重复轴重新索引"，通过以下异常：

In [150]: df[df>10]
.
.
.
~/.local/lib/python3.6/site-packages/pandas/core/indexes/base.py in _can_reindex(self, indexer)
3097         # trying to reindex on an axis with duplicates
3098         if not self.is_unique and len(indexer):
-> 3099             raise ValueError("cannot reindex from a duplicate axis")
3100 
3101     def reindex(self, target, method=None, level=None, limit=None, tolerance=None):

但是，我验证了：

In [151]: df.index                                                                                                                                                                                          
Out[151]: RangeIndex(start=0, stop=150001, step=1)
In [152]: df.index.is_unique                                                                                                                                                                                
Out[152]: True

因此，df.index.is_unique为 True，但验证"不self.is_unique"无法正常工作。

我已经用其他数据帧进行了测试，没有任何问题，但是使用一个大的(df.shape is (150001， 792((我得到了这个错误，我已经验证了我有唯一的索引。

此外，这样做：

In [177]:  new_df = df.reset_index(drop=True) 
In [178]:  new_df[new_df > 10]
~/.local/lib/python3.6/site-packages/pandas/core/indexes/base.py in _can_reindex(self, indexer)
3097         # trying to reindex on an axis with duplicates
3098         if not self.is_unique and len(indexer):
-> 3099             raise ValueError("cannot reindex from a duplicate axis")
3100 
3101     def reindex(self, target, method=None, level=None, limit=None, tolerance=None):
ValueError: cannot reindex from a duplicate axis

所以，100%确定我们这里没有重复项，那么这是怎么回事呢？

总而言之，问题出现在列名称 + 来自 np.array 的 numpy 特定值中

如果您想了解有关此错误的更多信息，我在github中打开了一个问题。

相关内容

最新更新

热门标签：