给出如下列表:
make = ['ford', 'fiat', 'nissan', 'suzuki', 'dacia']
model = ['x', 'y', 'z']
version = ['A', 'B', 'C']
typ = ['sedan', 'coupe', 'van', 'kombi']
infos = ['steering wheel problems', 'gearbox problems', 'broken engine', 'throttle problems', None]
total.append(make)
total.append(model)
total.append(version)
total.append(typ)
total.append(infos)
我需要创建一个包含这些列表的所有可能组合的列表,所以我这样做了:
combos = list(itertools.product(*total))
all_combos = [list(elem) for elem in combos]
现在我想比较,在JSON对象中找到与all_combos
的item中出现的值相同的一组项,并计数这些出现的次数。我的JSON很大,看起来有点像:
data = [
{ 'make': 'dacia'
'model': 'x',
'version': 'A',
'typ': 'sedan',
'infos': 'steering wheel problems'
}, ...]
我想得到这样的输出:
output = [
{ 'make': 'dacia'
'model': 'x',
'version': 'A',
'typ': 'sedan',
'infos': 'steering wheel problems',
'number_of_occurences_of_such_combination_of_fields_with__such_values': 75
}, ...]
如何解决这样的任务?
如果我理解正确的话,您想在数据键number_of_occurences_of_such_combination_of_fields_with__such_values
中添加每个字典:
from operator import itemgetter
from itertools import product
make = ["ford", "fiat", "nissan", "suzuki", "dacia"]
model = ["x", "y", "z"]
version = ["A", "B", "C"]
typ = ["sedan", "coupe", "van", "kombi"]
infos = [
"steering wheel problems",
"gearbox problems",
"broken engine",
"throttle problems",
None,
]
total = [make, model, version, typ, infos]
data = [
{
"make": "dacia",
"model": "x",
"version": "A",
"typ": "sedan",
"infos": "steering wheel problems",
},
{
"make": "dacia",
"model": "x",
"version": "A",
"typ": "sedan",
"infos": "steering wheel problems",
},
{
"make": "ford",
"model": "x",
"version": "A",
"typ": "sedan",
"infos": "steering wheel problems",
},
]
i = itemgetter("make", "model", "version", "typ", "infos")
cnt = {}
for c in itertools.product(*total):
for d in data:
if i(d) == c:
cnt.setdefault(c, []).append(d)
for k, v in cnt.items():
for d in v:
d[
"number_of_occurences_of_such_combination_of_fields_with__such_values"
] = len(v)
print(data)
打印:
[
{
"make": "dacia",
"model": "x",
"version": "A",
"typ": "sedan",
"infos": "steering wheel problems",
"number_of_occurences_of_such_combination_of_fields_with__such_values": 2,
},
{
"make": "dacia",
"model": "x",
"version": "A",
"typ": "sedan",
"infos": "steering wheel problems",
"number_of_occurences_of_such_combination_of_fields_with__such_values": 2,
},
{
"make": "ford",
"model": "x",
"version": "A",
"typ": "sedan",
"infos": "steering wheel problems",
"number_of_occurences_of_such_combination_of_fields_with__such_values": 1,
},
]
Version 2:(不含itertools.product):
from operator import itemgetter
i = itemgetter("make", "model", "version", "typ", "infos")
cnt = {}
for d in data:
c = i(d)
cnt[c] = cnt.get(c, 0) + 1
for d in data:
d[
"number_of_occurences_of_such_combination_of_fields_with__such_values"
] = cnt[i(d)]
print(data)