我正在寻找一种在python中动态制作基于所需结构的字典字典的方法。
我有下面的数据:
{'weather': ['windy', 'calm'], 'season': ['summer', 'winter', 'spring', 'autumn'], 'lateness': ['ontime', 'delayed']}
我给出我希望它们的结构是这样的:
['weather', 'season', 'lateness']
最后以这种格式获取数据:
{'calm': {'autumn': {'delayed': 0, 'ontime': 0},
'spring': {'delayed': 0, 'ontime': 0},
'summer': {'delayed': 0, 'ontime': 0},
'winter': {'delayed': 0, 'ontime': 0}},
'windy': {'autumn': {'delayed': 0, 'ontime': 0},
'spring': {'delayed': 0, 'ontime': 0},
'summer': {'delayed': 0, 'ontime': 0},
'winter': {'delayed': 0, 'ontime': 0}}}
这是我为实现这一目标而想到的手动方法:
dtree = {}
for cat1 in category_cases['weather']:
dtree.setdefault(cat1, {})
for cat2 in category_cases['season']:
dtree[cat1].setdefault(cat2, {})
for cat3 in category_cases['lateness']:
dtree[cat1][cat2].setdefault(cat3, 0)
你能想到一种方法来改变我写的结构并获得预期的结果吗? 请记住,结构可能每次的大小都不一样。
此外,如果您想到除字典之外的另一种方法,我可以访问结果,它也对我有用。
如果您不避免使用外部包,pandas.DataFrame
可能是一个可行的候选者,因为看起来您将使用一个表:
import pandas as pd
df = pd.DataFrame(
index=pd.MultiIndex.from_product([d['weather'], d['season']]),
columns=d['lateness'], data=0
)
结果:
ontime delayed
windy summer 0 0
winter 0 0
spring 0 0
autumn 0 0
calm summer 0 0
winter 0 0
spring 0 0
autumn 0 0
而且,您还可以使用索引轻松进行更改:
df.loc[('windy', 'summer'), 'ontime'] = 1
df.loc['calm', 'autumn']['delayed'] = 2
# Result:
ontime delayed
windy summer 1 0
winter 0 0
spring 0 0
autumn 0 0
calm summer 0 0
winter 0 0
spring 0 0
autumn 0 2
如果您始终对列使用最后一个键(假设您的键按所需的广告顺序排列(,则可以动态构造该表:
df = pd.DataFrame(
index=pd.MultiIndex.from_product(list(d.values())[:-1]),
columns=list(d.values())[-1], data=0
)
由于您对pandas
感兴趣,鉴于您的结构,我还建议您仔细阅读MultiIndex和Advance Indexing,只是为了了解如何处理您的数据。 以下是一些示例:
# Gets the sum of 'delayed' items in all of 'calm'
# Filters all the 'delayed' data in 'calm'
df.loc['calm', 'delayed']
# summer 5
# winter 0
# spring 0
# autumn 2
# Name: delayed, dtype: int64
# Apply a sum:
df.loc['calm', 'delayed'].sum()
# 7
# Gets the mean of all 'summer' (notice the `slice(None)` is required to return all of the 'calm' and 'windy' group)
df.loc[(slice(None), 'summer'), :].mean()
# ontime 0.5
# delayed 2.5
# dtype: float64
它绝对非常方便且用途广泛,但在您深入探讨之前,您可能肯定想先阅读,该框架可能需要一些时间来适应。
否则,如果您仍然喜欢dict
,那没有错。 下面是一个基于给定键生成的递归函数(假设您的键按所需的广告顺序排列(:
def gen_dict(d, level=0):
if level >= len(d):
return 0
key = tuple(d.keys())[level]
return {val: gen_dict(d, level+1) for val in d.get(key)}
gen_dict(d)
结果:
{'calm': {'autumn': {'delayed': 0, 'ontime': 0},
'spring': {'delayed': 0, 'ontime': 0},
'summer': {'delayed': 0, 'ontime': 0},
'winter': {'delayed': 0, 'ontime': 0}},
'windy': {'autumn': {'delayed': 0, 'ontime': 0},
'spring': {'delayed': 0, 'ontime': 0},
'summer': {'delayed': 0, 'ontime': 0},
'winter': {'delayed': 0, 'ontime': 0}}}
我认为这可能对你有用。
def get_output(category, order, i=0):
output = {}
for key in order[i:i+1]:
for value in category[key]:
output[value] = get_output(category, order, i+1)
if output == {}:
return 0
return output
这是一个递归解决方案,与 r.ook 在出色的公认答案中提供的解决方案略有不同:
category_cases = {'weather': ['windy', 'calm'],
'season': ['summer', 'winter', 'spring', 'autumn'],
'lateness': ['ontime', 'delayed']}
order = ['weather', 'season', 'lateness']
def gen_tree(category_cases, order):
if len(order) == 0:
return 0
return {x:gen_tree(category_cases, order[1:]) for x in category_cases[order[0]]}
它不假定字典保留键的顺序,因此它应该更加向后兼容。
您可以使用itertools.product
来获取字典值之间的笛卡尔乘积(假设您想要相同的键顺序(。然后我们可以迭代除最后一个键之外的每个键,使用setdefault
插入/更新字典。然后我们可以设置最里面的键 计数为0
.
from itertools import product
from pprint import pprint
d = {
"weather": ["windy", "calm"],
"season": ["summer", "winter", "spring", "autumn"],
"lateness": ["ontime", "delayed"],
}
result = {}
for comb in product(*d.values()):
current = result
for key in comb[:-1]:
current = current.setdefault(key, {})
current[comb[-1]] = 0
pprint(result)
输出:
{'calm': {'autumn': {'delayed': 0, 'ontime': 0},
'spring': {'delayed': 0, 'ontime': 0},
'summer': {'delayed': 0, 'ontime': 0},
'winter': {'delayed': 0, 'ontime': 0}},
'windy': {'autumn': {'delayed': 0, 'ontime': 0},
'spring': {'delayed': 0, 'ontime': 0},
'summer': {'delayed': 0, 'ontime': 0},
'winter': {'delayed': 0, 'ontime': 0}}}
如果我们想要一个自定义顺序(对于没有保证键顺序的旧python版本可能很有用(,我们可以将列表传递给product()
:
order = ['weather', 'season', 'lateness']
result = {}
for comb in product(*map(d.get, order)):
current = result
for key in comb[:-1]:
current = current.setdefault(key, {})
current[comb[-1]] = 0
是的,您可以使用以下代码实现此目的:
import copy
structure = ['weather', 'season', 'lateness']
data = {'weather': ['windy', 'calm'], 'season': ['summer', 'winter', 'spring', 'autumn'],
'lateness': ['ontime', 'delayed'], }
d_tree = dict()
n = len(structure) # length of the structure list
prev_val = 0 # the innermost value
while n > 0:
n -= 1
keys = data.get(structure[n]) or list() # get the list of values from data
# Idea here is to start with inner most dict and keep moving outer
d_tree.clear()
for key in keys:
d_tree[key] = copy.copy(prev_val)
prev_val = copy.copy(d_tree) # Copy the d_tree to put as value to outer dict
print(d_tree)
希望这有帮助!!