从包含映射到值的索引的字典的字典中制作 Pandas 数据帧



我有一个dictsdict,我正在尝试将其制作成PandasDataFramedict的结构是映射到将列索引映射到其值的dict的索引,然后我希望DataFrame中的其他所有内容均为 0。例如:

d = {0: {0:2, 2:5},
1: {1:1, 3:2},
2: {2:5}}

所以我希望DataFrame看起来像

index   c0   c1   c2   c3
0  2.0  NaN  5.0  NaN
1  NaN  1.0  NaN  2.0
2  NaN  NaN  5.0  NaN

我目前计划编写一个函数,该函数将从d的每个项目中yield一个元组,并将其用作创建DataFrame的迭代对象,但我对是否有其他人做过类似的事情感兴趣。

只是简单的调用DataFrame.from_dict

pd.DataFrame.from_dict(d,'index').sort_index(axis=1)
0    1    2    3
0  2.0  NaN  5.0  NaN
1  NaN  1.0  NaN  2.0
2  NaN  NaN  5.0  NaN

好吧,为什么不以常规方式进行并转置它:

>>> pd.DataFrame(d).T
0    1    2    3
0  2.0  NaN  5.0  NaN
1  NaN  1.0  NaN  2.0
2  NaN  NaN  5.0  NaN
>>> 

经过一段时间的测试,我发现我原来的方法要快得多。我正在使用以下函数来制作一个我传递给pd.DataFrame的迭代器

def row_factory(index_data, row_len):
"""
Make a generator for iterating for index_data
Parameters:
index_data (dict): a dict mapping the a value to a dict of index mapped to values. All indexes not in
second dict are assumed to be None.
row_len (int): length of row
Example:
index_data = {0: {0:2, 2:1}, 1: {1:1}} would yield [0, 2, None, 1] then [1, None, 1, None]
"""
for key, data in index_data.items():
# Initialize row with the key starting, then None for each value
row = [key] + [None] * (row_len - 1)
for index, value in data.items():
# Only replace indexes that have a value
row[index] = value
yield row
df = pd.DataFrame(row_factory(d), 5)

最新更新