如何将pandas DataFrame中的一行元组扩展为多行作为多索引



示例数据帧:

>>> idx = pd.MultiIndex.from_arrays([['foo', 'foo', 'bar', 'bar'], ['one', 'two', 'one', 'two']])
>>> df = pd.DataFrame({'Col1': [('a', 'b'), 'c', 'd', 'e'], 'Col2': [('A', 'B'), 'C', 'D', 'E']}, index=index)
>>> print(df)
Col1    Col2
foo one  (a, b)  (A, B)
two       c       C
bar one       d       D
two       e       E

我想通过拆包元组行来转换DataFrame,同时将所有内容都保留在其原始索引下,结果如下:

Col1 Col2
foo one 0    a    A
1    b    B
two 0    c    C
bar one 0    d    D
two 0    e    E

我可以很好地解压缩元组,但我只是很难弄清楚如何将新行重新插入DataFrame。这是我已经尝试过的一个例子:

>>> unpacked = pd.DataFrame(df.loc['foo', 'one'].tolist(), index=df.columns).T
>>> print(unpacked)
Col1 Col2
0    a    A
1    b    B
>>> df.loc['foo', 'one'] = unpacked
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:Program FilesPython37libsite-packagespandascoreindexing.py", line 190, in __setitem__
self._setitem_with_indexer(indexer, value)
File "C:Program FilesPython37libsite-packagespandascoreindexing.py", line 645, in _setitem_with_indexer
value = self._align_frame(indexer, value)
File "C:Program FilesPython37libsite-packagespandascoreindexing.py", line 860, in _align_frame
raise ValueError('Incompatible indexer with DataFrame')
ValueError: Incompatible indexer with DataFrame

失败的原因很明显,但我不确定该何去何从。在这个过程中,有没有办法创建一个新的MultiIndex级别,可以处理任意数量的未打包行?

在列表理解中使用Series.explodeconcat,然后通过GroupBy.cumcount:添加新级别

df = pd.concat([df[x].explode() for x in df.columns], axis=1)
df = df.set_index(df.groupby(df.index).cumcount(), append=True)
print (df)
Col1 Col2
foo one 0    a    A
1    b    B
two 0    c    C
bar one 0    d    D
two 0    e    E

最新更新