如何将数组字典转换为'flattened'数据帧?



假设我有一个数组字典,例如:

favourite_icecreams = {
'Josh': ['vanilla', 'banana'],
'Greg': ['chocolate'],
'Sarah': ['mint', 'vanilla', 'mango']
}

我想将其转换为pandas数据框架,列为" flavor "one_answers";Person"。它应该看起来像这样:

tbody> <<tr>香蕉巧克力芒果
stylePerson
香草Josh
Josh
格雷格
薄荷莎拉
香草莎拉
莎拉

另一个解决方案,使用.explode():

df = pd.DataFrame(
{
"Person": favourite_icecreams.keys(),
"Flavour": favourite_icecreams.values(),
}
).explode("Flavour")

print(df)

打印:

Person    Flavour
0    Josh    vanilla
0    Josh     banana
1    Greg  chocolate
2   Sarah       mint
2   Sarah    vanilla
2   Sarah      mango

您可以使用(生成器)推导式,然后将其提供给pd.DataFrame:

import pandas as pd
favourite_icecreams = {
'Josh': ['vanilla', 'banana'],
'Greg': ['chocolate'],
'Sarah': ['mint', 'vanilla', 'mango']
}
data = ((flavour, person)
for person, flavours in favourite_icecreams.items()
for flavour in flavours)
df = pd.DataFrame(data, columns=('Flavour', 'Person'))
print(df)
# Flavour Person
# 0    vanilla   Josh
# 1     banana   Josh
# 2  chocolate   Greg
# 3       mint  Sarah
# 4    vanilla  Sarah
# 5      mango  Sarah

您可以像下面这样在pandas中使用DataFrame.from_dictdf.stack:

In [453]: df = pd.DataFrame.from_dict(favourite_icecreams, orient='index').stack().reset_index().drop('level_1', 1)
In [455]: df.columns = ['Person', 'Flavour']
In [456]: df
Out[456]: 
Person    Flavour
0   Josh    vanilla
1   Josh     banana
2   Greg  chocolate
3  Sarah       mint
4  Sarah    vanilla
5  Sarah      mango

一种选择是将person和flavor提取到单独的列表中,在person列表中使用numpy repeat,最后创建DataFrame:

from itertools import chain
person, flavour = zip(*favourite_icecreams.items())
lengths = list(map(len, flavour))
person = np.array(person).repeat(lengths)
flavour = chain.from_iterable(flavour)
pd.DataFrame({'person':person, 'flavour':flavour})
person    flavour
0   Josh    vanilla
1   Josh     banana
2   Greg  chocolate
3  Sarah       mint
4  Sarah    vanilla
5  Sarah      mango

最新更新