查找嵌套JSON结构的所有可能迭代



我有一个一直在处理的配置文件,例如:

"Preprocessing": {
"BOW":{"ngram_range":[1,2], "max_features":[100, 200]},
"RemoveStopWords": {"Parameter1": ["..."]}
}

其想法是获取这些数据,在两个预处理步骤之间运行每次迭代,并将其传递到预处理对象中。我正在寻找的输出是:

[{"BOW":{"ngram_range":1, "max_features":100}, "RemoveStopWords":{"Parameter1": "..."},
{"BOW":{"ngram_range":2, "max_features":100}, "RemoveStopWords":{"Parameter1": "..."},
{"BOW":{"ngram_range":1, "max_features":200}, "RemoveStopWords":{"Parameter1": "..."},
{"BOW":{"ngram_range":2, "max_features":200}, "RemoveStopWords":{"Parameter1": "..."}]

当前代码:

def unpack_preprocessing_steps(preprocessing: dict):
"""
This script will take the Preprocessing section of the config file
and produce a list of preprocessing combinations.
"""
preprocessing_steps = [] # save for all steps bow, w2v, etc.
preprocessing_params = [] # individual parameters for each preprocessing step
for key, values in preprocessing.items():
for key2, values2 in values.items():
preprocessing_steps.append(key2)
preprocessing_params.append(values2)
iterables = product(*preprocessing_params) # Creates a matrix of every combination
iterable_of_params = [i for i in iterables] 
exploded_preprocessing_list = []
for params in iterable_of_params:
individual_objects = {} # store each object as an unpackable datatype
for step, param in zip(preprocessing_steps, params):
individual_objects[step] = param # This stores ever iteration as it's own set of preprocesses
exploded_preprocessing_list.append(individual_objects)

return exploded_preprocessing_list

当前输出(错误(为:

[{"ngram_range":1, "max_features":100, "Parameter1":"..."},
{"ngram_range":2, "max_features":200, "Parameter1":"..."},
{"ngram_range":1, "max_features":100, "Parameter1":"..."},
{"ngram_range":2, "max_features":200, "Parameter1":"..."}]

假设您总是想要相同的RemoveStopWords部分,这应该对您有效。它生成所有功能键和值的乘积:

from itertools import product
# pprint just to make output clearer
from pprint import pprint
config = {
"Preprocessing": {
"BOW": {"ngram_range":[1,2]},
"RemoveStopWords": {"Parameter1": ["..."]},
}
}
features, values = zip(*config["Preprocessing"]["BOW"].items())
bows = [dict(zip(features, v)) for v in product(*values)]
newconf = []
for bow in bows:
newconf.append({
"BOW": bow,
"RemoveStopWords": config["Preprocessing"]["RemoveStopWords"],
})
print(newconf)

结果:

[{'BOW': {'max_features': 100, 'ngram_range': 1}, 'RemoveStopWords': {'Parameter1': ['...']}},
{'BOW': {'max_features': 200, 'ngram_range': 1}, 'RemoveStopWords': {'Parameter1': ['...']}},
{'BOW': {'max_features': 100, 'ngram_range': 2}, 'RemoveStopWords': {'Parameter1': ['...']}},
{'BOW': {'max_features': 200, 'ngram_range': 2}, 'RemoveStopWords': {'Parameter1': ['...']}}]

最新更新