在 pandas 数据帧中查找重复项时出现不可哈希列表错误



嗨,这真的让我感到困惑,因为我在一个大的datframe上使用了一个命令:

df.duplicated(subset=None, keep='first)

这看起来与文档所说的相同:

DataFrame.duplicated(subset=None, keep='first')

我只是使用 df 代替,但是,我得到的只是以下回溯:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-53-529f7b7a97fb> in <module>()
----> 1 df.duplicated(subset=None, keep='first')
/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py in duplicated(self, subset, keep)
4383         vals = (col.values for name, col in self.iteritems()
4384                 if name in subset)
-> 4385         labels, shape = map(list, zip(*map(f, vals)))
4386 
4387         ids = get_group_index(labels, shape, sort=False, xnull=False)
/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py in f(vals)
4364         def f(vals):
4365             labels, shape = algorithms.factorize(
-> 4366                 vals, size_hint=min(len(self), _SIZE_HINT_LIMIT))
4367             return labels.astype('i8', copy=False), len(shape)
4368 
/anaconda3/lib/python3.7/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
176                 else:
177                     kwargs[new_arg_name] = new_arg_value
--> 178             return func(*args, **kwargs)
179         return wrapper
180     return _deprecate_kwarg
/anaconda3/lib/python3.7/site-packages/pandas/core/algorithms.py in factorize(values, sort, order, na_sentinel, size_hint)
628                                            na_sentinel=na_sentinel,
629                                            size_hint=size_hint,
--> 630                                            na_value=na_value)
631 
632     if sort and len(uniques) > 0:
/anaconda3/lib/python3.7/site-packages/pandas/core/algorithms.py in _factorize_array(values, na_sentinel, size_hint, na_value)
474     uniques = vec_klass()
475     labels = table.get_labels(values, uniques, 0, na_sentinel,
--> 476                               na_value=na_value)
477 
478     labels = _ensure_platform_int(labels)
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_labels()
TypeError: unhashable type: 'list'

我做错了什么?

据我所知,您的数据框中有列表,而 python 或 Pandas 不能散列列表。您可能已经观察到了这一点,以防您曾经尝试将列表用作字典中的键。一个简单的解决方法是将列表转换为可哈希的元组。

最新更新