我使用dataframe. from_dict将python字典转换为pandas数据框架。如果字典中的项是按一定顺序排列的,它就能完全满足我的要求。我有一个参数字典,其中一些项是单值,一些是列表,一些是字典。只要在字典的第一个位置没有列表或字典,它就可以完美地工作。如果我有一个列表或字典在第一个位置,它不能工作。
>>> import pandas as pd
>>> my_dict = {'a': 1, 'b': [1, 2, 3], 'c': {'x': 'aa', 'y': 'bb'}}
>>> my_dict
{'a': 1, 'b': [1, 2, 3], 'c': {'x': 'aa', 'y': 'bb'}}
>>> pd.DataFrame.from_dict(my_dict, orient='index').reset_index().rename(columns={'index': 'prop_name', 0: 'prop_value'})
prop_name prop_value
0 a 1
1 b [1, 2, 3]
2 c {'x': 'aa', 'y': 'bb'}
>>> my_dict2 = {'c': {'x': 'aa', 'y': 'bb'}, 'a': 1, 'b': [1, 2, 3]}
>>> pd.DataFrame.from_dict(my_dict2, orient='index').reset_index().rename(columns={'index': 'prop_name', 0: 'prop_value'})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3/dist-packages/pandas/core/frame.py", line 1300, in from_dict
data = _from_nested_dict(data)
File "/usr/lib/python3/dist-packages/pandas/core/frame.py", line 9281, in _from_nested_dict
for col, v in s.items():
AttributeError: 'int' object has no attribute 'items'
>>> my_dict3 = {'b': [1, 2, 3], 'c': {'x': 'aa', 'y': 'bb'}, 'a': 1 }
>>> pd.DataFrame.from_dict(my_dict3, orient='index').reset_index().rename(columns={'index': 'prop_name', 0: 'prop_value'})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3/dist-packages/pandas/core/frame.py", line 1309, in from_dict
return cls(data, index=index, columns=columns, dtype=dtype)
File "/usr/lib/python3/dist-packages/pandas/core/frame.py", line 509, in __init__
arrays, columns = to_arrays(data, columns, dtype=dtype)
File "/usr/lib/python3/dist-packages/pandas/core/internals/construction.py", line 524, in to_arrays
return _list_to_arrays(data, columns, coerce_float=coerce_float, dtype=dtype)
File "/usr/lib/python3/dist-packages/pandas/core/internals/construction.py", line 561, in _list_to_arrays
content = list(lib.to_object_array(data).T)
File "pandas/_libs/lib.pyx", line 2448, in pandas._libs.lib.to_object_array
TypeError: object of type 'int' has no len()
>>>
在pandas代码中很容易看出错误发生的原因。有什么更好的方法来做这个,这样我就不会得到依赖于第一个位置的误差?这正是我想要的。
您可以直接创建数据框架,使用dictionary:
my_dict2 = {"c": {"x": "aa", "y": "bb"}, "a": 1, "b": [1, 2, 3]}
df = pd.DataFrame(
{"prop_name": my_dict2.keys(), "prop_value": my_dict2.values()}
)
print(df)
打印:
prop_name prop_value
0 c {'x': 'aa', 'y': 'bb'}
1 a 1
2 b [1, 2, 3]
对于my_dict2 = {"a": 1, "b": [1, 2, 3], "c": {"x": "aa", "y": "bb"}}
,这产生:
prop_name prop_value
0 a 1
1 b [1, 2, 3]
2 c {'x': 'aa', 'y': 'bb'}
注意:正如@TrentonMcKinney在评论中所说,数据框架的构造方式取决于字典(源)的第一项:
if isinstance(list(data.values())[0], (Series, dict)):
data = _from_nested_dict(data)
else:
data, index = list(data.values()), list(data.keys())
因此pd.DataFrame.from_dict({"b": 1, "a": [1, 2, 3]}, orient="index")
成功而pd.DataFrame.from_dict({"a": [1, 2, 3], "b": 1},orient="index")
产生错误。