我正在尝试使用包含比系列原始索引更多的级别的索引(ix
)设置Series
(a1
)的索引。
>>> a = pd.DataFrame({'a': [1, 2, 3], 'b': ['b', 'b', 'b'], 'x': [4, 5, 6]}).set_index(['a', 'b'])
>>> a
x
a b
1 b 4
2 b 5
3 b 6
>>>
>>> a1 = a['x']
>>> a1
a b
1 b 4
2 b 5
3 b 6
Name: x, dtype: int64
>>> ix = pd.MultiIndex.from_product(([1, 2, 3], ['b', 'c'], [10, 20]), names=['a', 'b', 'c'])
>>> ix
MultiIndex(levels=[[1, 2, 3], [u'b', u'c'], [10, 20]],
labels=[[0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2], [0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1], [0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1]],
names=[u'a', u'b', u'c'])
>>> a.set_index(ix)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "python2.7/site-packages/pandas/core/frame.py", line 3164, in set_index
frame.index = index
File "python2.7/site-packages/pandas/core/generic.py", line 3627, in __setattr__
return object.__setattr__(self, name, value)
File "pandas/_libs/properties.pyx", line 69, in pandas._libs.properties.AxisProperty.__set__
File "python2.7/site-packages/pandas/core/generic.py", line 559, in _set_axis
self._data.set_axis(axis, labels)
File "python2.7/site-packages/pandas/core/internals.py", line 3074, in set_axis
(old_len, new_len))
ValueError: Length mismatch: Expected axis has 3 elements, new values have 12 elements
因此,我期待以下系列:
a b c
1 b 10 4
2 b 10 5
3 b 10 6
1 c 10 nan # [1, c] wasn't an index in a1
2 c 10 nan # ...
3 c 10 nan # ...
1 b 20 4 # [1, b] was an index of a1, so use that value
2 b 20 5 # ...
3 b 20 6 # ...
1 c 20 nan # [1, c] wasn't an index in a1
2 c 20 nan # ...
3 c 20 nan # ...
# if there was an index in a1 that isn't in `ix`, it should be maintained with
# its value and the index should be augmented
我怎样才能用熊猫做到这一点?
使用Series.reindex
,但输出有点不同,因为MultiIndex
排序,如果以后有效工作,显然是必要的 - 来源:
要有效地对
MultiIndex
对象进行索引和切片,需要对它们进行排序。与任何索引一样,您可以使用 sort_index()。
a = a.reindex(ix)
print (a)
x
a b c
1 b 10 4.0
20 4.0
c 10 NaN
20 NaN
2 b 10 5.0
20 5.0
c 10 NaN
20 NaN
3 b 10 6.0
20 6.0
c 10 NaN
20 NaN
问题是你有一个空的数据框,它有三列,你试图给它分配十二列的多索引;如果最初创建一个包含四列的空数据框,则错误将消失:
df = pd.DataFrame(pd.np.empty((0, 12)))
df.columns = pd.MultiIndex(levels = [['first', 'second'], ['a', 'b']], labels = [[0, 0, 1, 1], [0, 1, 0, 1]])
或者,您可以使用多索引创建空数据框,如下所示:
multi_index = pd.MultiIndex(levels = [['first', 'second'], ['a', 'b']], labels = [[0, 0, 1, 1], [0, 1, 0, 1]])
df = pd.DataFrame(columns=multi_index)
df
# first second
# a b a b