获取字典中的决策树



我正在寻找一种在python中动态制作基于所需结构的字典字典的方法。

我有下面的数据:

{'weather': ['windy', 'calm'], 'season': ['summer', 'winter', 'spring', 'autumn'],  'lateness': ['ontime', 'delayed']} 

我给出我希望它们的结构是这样的:

['weather', 'season', 'lateness']

最后以这种格式获取数据:

{'calm': {'autumn': {'delayed': 0, 'ontime': 0},
'spring': {'delayed': 0, 'ontime': 0},
'summer': {'delayed': 0, 'ontime': 0},
'winter': {'delayed': 0, 'ontime': 0}},
'windy': {'autumn': {'delayed': 0, 'ontime': 0},
'spring': {'delayed': 0, 'ontime': 0},
'summer': {'delayed': 0, 'ontime': 0},
'winter': {'delayed': 0, 'ontime': 0}}}

这是我为实现这一目标而想到的手动方法:

dtree = {}
for cat1 in category_cases['weather']:
dtree.setdefault(cat1, {})
for cat2 in category_cases['season']:
dtree[cat1].setdefault(cat2, {})
for cat3 in category_cases['lateness']:
dtree[cat1][cat2].setdefault(cat3, 0)

你能想到一种方法来改变我写的结构并获得预期的结果吗? 请记住,结构可能每次的大小都不一样。

此外,如果您想到除字典之外的另一种方法,我可以访问结果,它也对我有用。

如果您不避免使用外部包,pandas.DataFrame可能是一个可行的候选者,因为看起来您将使用一个表:

import pandas as pd
df = pd.DataFrame(
index=pd.MultiIndex.from_product([d['weather'], d['season']]),
columns=d['lateness'], data=0
)

结果:

ontime  delayed
windy summer       0        0
winter       0        0
spring       0        0
autumn       0        0
calm  summer       0        0
winter       0        0
spring       0        0
autumn       0        0

而且,您还可以使用索引轻松进行更改:

df.loc[('windy', 'summer'), 'ontime'] = 1
df.loc['calm', 'autumn']['delayed'] = 2
# Result:
ontime  delayed
windy summer       1        0
winter       0        0
spring       0        0
autumn       0        0
calm  summer       0        0
winter       0        0
spring       0        0
autumn       0        2

如果您始终对列使用最后一个键(假设您的键按所需的广告顺序排列(,则可以动态构造该表:

df = pd.DataFrame(
index=pd.MultiIndex.from_product(list(d.values())[:-1]), 
columns=list(d.values())[-1], data=0
)

由于您对pandas感兴趣,鉴于您的结构,我还建议您仔细阅读MultiIndex和Advance Indexing,只是为了了解如何处理您的数据。 以下是一些示例:

# Gets the sum of 'delayed' items in all of 'calm'
# Filters all the 'delayed' data in 'calm'    
df.loc['calm', 'delayed']
# summer    5
# winter    0
# spring    0
# autumn    2
# Name: delayed, dtype: int64
# Apply a sum:
df.loc['calm', 'delayed'].sum()
# 7
# Gets the mean of all 'summer' (notice the `slice(None)` is required to return all of the 'calm' and 'windy' group)
df.loc[(slice(None), 'summer'), :].mean()
# ontime     0.5
# delayed    2.5
# dtype: float64

它绝对非常方便且用途广泛,但在您深入探讨之前,您可能肯定想先阅读,该框架可能需要一些时间来适应。


否则,如果您仍然喜欢dict,那没有错。 下面是一个基于给定键生成的递归函数(假设您的键按所需的广告顺序排列(:

def gen_dict(d, level=0):
if level >= len(d):
return 0
key = tuple(d.keys())[level]
return {val: gen_dict(d, level+1) for val in d.get(key)}
gen_dict(d)

结果:

{'calm': {'autumn': {'delayed': 0, 'ontime': 0},
'spring': {'delayed': 0, 'ontime': 0},
'summer': {'delayed': 0, 'ontime': 0},
'winter': {'delayed': 0, 'ontime': 0}},
'windy': {'autumn': {'delayed': 0, 'ontime': 0},
'spring': {'delayed': 0, 'ontime': 0},
'summer': {'delayed': 0, 'ontime': 0},
'winter': {'delayed': 0, 'ontime': 0}}}

我认为这可能对你有用。

def get_output(category, order, i=0):
output = {}
for key in order[i:i+1]:
for value in category[key]:
output[value] = get_output(category, order, i+1)
if output == {}:
return 0
return output

这是一个递归解决方案,与 r.ook 在出色的公认答案中提供的解决方案略有不同:

category_cases = {'weather': ['windy', 'calm'],
'season': ['summer', 'winter', 'spring', 'autumn'],
'lateness': ['ontime', 'delayed']}
order = ['weather', 'season', 'lateness']
def gen_tree(category_cases, order):
if len(order) == 0:
return 0
return {x:gen_tree(category_cases, order[1:]) for x in category_cases[order[0]]}

假定字典保留键的顺序,因此它应该更加向后兼容。

您可以使用itertools.product来获取字典值之间的笛卡尔乘积(假设您想要相同的键顺序(。然后我们可以迭代除最后一个键之外的每个键,使用setdefault插入/更新字典。然后我们可以设置最里面的键 计数为0.

from itertools import product
from pprint import pprint
d = {
"weather": ["windy", "calm"],
"season": ["summer", "winter", "spring", "autumn"],
"lateness": ["ontime", "delayed"],
}
result = {}
for comb in product(*d.values()):
current = result
for key in comb[:-1]:
current = current.setdefault(key, {})
current[comb[-1]] = 0
pprint(result)

输出:

{'calm': {'autumn': {'delayed': 0, 'ontime': 0},
'spring': {'delayed': 0, 'ontime': 0},
'summer': {'delayed': 0, 'ontime': 0},
'winter': {'delayed': 0, 'ontime': 0}},
'windy': {'autumn': {'delayed': 0, 'ontime': 0},
'spring': {'delayed': 0, 'ontime': 0},
'summer': {'delayed': 0, 'ontime': 0},
'winter': {'delayed': 0, 'ontime': 0}}}

如果我们想要一个自定义顺序(对于没有保证键顺序的旧python版本可能很有用(,我们可以将列表传递给product()

order = ['weather', 'season', 'lateness']
result = {}
for comb in product(*map(d.get, order)):
current = result
for key in comb[:-1]:
current = current.setdefault(key, {})
current[comb[-1]] = 0

是的,您可以使用以下代码实现此目的:

import copy
structure = ['weather', 'season', 'lateness']
data = {'weather': ['windy', 'calm'], 'season': ['summer', 'winter', 'spring', 'autumn'],
'lateness': ['ontime', 'delayed'], }
d_tree = dict()
n = len(structure)  # length of the structure list
prev_val = 0  # the innermost value
while n > 0:
n -= 1
keys = data.get(structure[n]) or list()  # get the list of values from data
# Idea here is to start with inner most dict and keep moving outer
d_tree.clear()
for key in keys:
d_tree[key] = copy.copy(prev_val)
prev_val = copy.copy(d_tree)  # Copy the d_tree to put as value to outer dict
print(d_tree)

希望这有帮助!!

最新更新