递归展平复杂python dicts的列表



我想用一个非常复杂的dict创建一个CSV文件。真正的dict使用数千个键和9个以上的深度,但这只是结构的一个例子:

import pandas
my_stuff = [
{
"a":
[
{"1": "example1"},
{"2": [
{"2": "example2"},
{"3": "example3"}
]},
{"4": "example4"},
{"5": "example5"}
],
"b":
[
"example6", "61", "62"
]
}
]
result = pandas.json_normalize(my_stuff)
print(result.to_csv())

打印:

,a,b 0,
"[{'1': 'example1'}, {'2': [{'2': 'example2'}, {'3': 'example3'}]}, {'4': 'example4'}, {'5': 'example5'}]","['example6', '61', '62']"

但我想要这个输出:

"0.a.0.1, 0.a.0.2.2, 0.a.0.2.3, 0.a.0.4, 0.a.0.5, 0b.0"
"example1, example2, example3, example4, example5, example6;61;62"

我以为熊猫可以压平格言,但似乎做不到。我需要像sectiona.subsection1.fieldwhatever一样将密钥用作标头,因为.csv稍后将加载到数据库中。

我希望任何人都能帮忙。

额外奖励:我试过不带熊猫,但被困在这里:

def flatten(py_structure, depth=""):
"""make a flatten dict"""
new_dict = {}
if isinstance(py_structure, dict):
for k, v in py_structure.items():
if isinstance(v, dict):
flattened_v = flatten(v, k)
elif isinstance(v, list):
flattened_v = flatten(v, k)
else:
flattened_v = v
new_dict[f"{depth}{k}"] = flattened_v
return new_dict
elif isinstance(py_structure, list):
for idx, v in enumerate(py_structure):
new_dict[f"{depth}{idx}"] = flatten(v, f"{depth}{idx}")
return new_dict

您可以通过自定义树容器的深度优先遍历来实现这一点:

import pprint

class Container:
def __init__(self, data):
self.is_leaf = False
if type(data) is list:
self.data = [Container(x) for x in data]
elif type(data) is dict:
self.data = {k: Container(v) for k, v in data.items()}
else:
self.is_leaf = True
self.data = data
def walk(self, callback):
self._walk(self, callback=callback, path=[])
def _walk(self, container, callback=None, path=None):
if type(container.data) is not dict 
and all(x.is_leaf for _, x in container.items()):
callback(".".join(path), [x.data for _, x in container.items()])
else:
for k, c in container.items():
self._walk(c, callback=callback, path=path+[str(k)])
def items(self):
if type(self.data) is list:
yield from enumerate(self.data)
elif type(self.data) is dict:
yield from self.data.items()
else:
yield None, self
def flatten(self):
result = {}
def callback(key, value):
result[key] = value
self.walk(callback)
return result

data = [
{
"a":
[
{"1": "example1"},
{"2": [
{"2": "example2"},
{"3": "example3"}
]},
{"4": "example4"},
{"5": "example5"}
],
"b":
[
"example6", "61", "62"
]
}
]
c = Container(data)
pprint.pprint(c.flatten())

将输出:

{'0.a.0.1': ['example1'],
'0.a.1.2.0.2': ['example2'],
'0.a.1.2.1.3': ['example3'],
'0.a.2.4': ['example4'],
'0.a.3.5': ['example5'],
'0.b': ['example6', '61', '62']}

最新更新