尝试从JSON创建字典,但值消失



我已经使用了几种不同的方法将我的字典列表获取到单个字典,它们都可以工作而不会抛出错误,但问题是无论使用哪种方法,我的值都会丢失!

我有一个json:

[{"Query Fields Used":null,"History Connection Name":null,"History Query Run Count":"138,036"},
{"Query Fields Used":"["shifts.date_date", "hifts.sum_booked_shifts"]","History Connection Name":"sandy","History Query Run Count":"59,069"},
{"Query Fields Used":"["daily_supply_v2.sum_booked_shifts", "daily_supply_v2.date_date"]","History Connection Name":"low","History Query Run Count":"50,259"},
{"Query Fields Used":"["daily_supply_v2.date_date", "daily_supply_v2.sum_booked_shifts"]","History Connection Name":"sandy","History Query Run Count":"50,235"},
{"Query Fields Used":"["orders.count_loads", "orders.complete_timestamp_date"]","History Connection Name":"sandy","History Query Run Count":"29,658"},
{"Query Fields Used":"["hitch_stack.date_date", "hitch_stack.hitch_count"]","History Connection Name":"low","History Query Run Count":"24,928"}]

我是用pandas.read_json导入的。查询字段是作为一个字符串出现的,所以我用regex sub删除了所有特殊字符,然后用逗号分隔,使其成为一个列表。

xx =[]
for i in looker['Query Fields Used']:
xx.append(i)

yy = []
for i in xx:
if i==None:
yy.append("None")
else:
x = re.sub(r'[^a-zA-Z0-9_.,]','', i)
yy.append(x)

y = []
for i in yy:
if i==None:
y.append("None")
else:
y.append(i.split(","))

looker['Query Fields Used'] = y

然后我跑了

d = [{key: val} for key, val in zip(looker['History Connection Name'], looker['Query Fields Used'])]

给了我正确的键,但是值停止填充。我还尝试创建一个字典列表,并通过连接导入funcy,结果完全相同。

我正在寻找一个包含相应列表中所有值的值的字典,但只得到几个。我哪里做错了?

它看起来像你想要History Connection Name是关键在你的字典和Query Fields Used是值?如果是这样,你的字典"丢失"的原因是什么?因为Python字典的键必须是唯一的,这不是History Connection Names:

的情况
{0: None, 1: 'sandy', 2: 'low', 3: 'sandy', 4: 'sandy', 5: 'low'}

您可以为记录创建唯一的键(例如,使用数字索引)并相应地重新构建字典。

或者您可以将这些值与它们各自的键合并。下面有两种方法可以合并对应于键的值:

from collections import defaultdict
import pandas as pd
# Given data
data_json = r"""[{"Query Fields Used": null, "History Connection Name": null, "History Query Run Count": "138,036"},
{"Query Fields Used"      : "["shifts.date_date", "hifts.sum_booked_shifts"]", "History Connection Name": "sandy",
"History Query Run Count": "59,069"},
{"Query Fields Used"      : "["daily_supply_v2.sum_booked_shifts", "daily_supply_v2.date_date"]",
"History Connection Name": "low", "History Query Run Count": "50,259"},
{"Query Fields Used"      : "["daily_supply_v2.date_date", "daily_supply_v2.sum_booked_shifts"]",
"History Connection Name": "sandy", "History Query Run Count": "50,235"},
{"Query Fields Used"      : "["orders.count_loads", "orders.complete_timestamp_date"]",
"History Connection Name": "sandy", "History Query Run Count": "29,658"},
{"Query Fields Used"      : "["hitch_stack.date_date", "hitch_stack.hitch_count"]",
"History Connection Name": "low", "History Query Run Count": "24,928"}]"""
# 1 - Iteration and using pd.read_json, like in your example
data = pd.read_json(data_json, orient="columns").to_dict()
merged: dict = defaultdict(list)
for name, fields in zip(data["History Connection Name"].values(), data["Query Fields Used"].values()):
merged[str(name)].append(fields)
# 2 - Keeping the records in a dataframe
data = pd.read_json(data_json, orient="columns")
args: dict = {'History Connection Name': 'first', 'Query Fields Used': ' '.join}
data = data.groupby("History Connection Name", as_index=False).agg(args)
# Outputs of the two merges respectively
{'None' : [None],
'sandy': ['["shifts.date_date", "hifts.sum_booked_shifts"]',
'["daily_supply_v2.date_date", "daily_supply_v2.sum_booked_shifts"]',
'["orders.count_loads", "orders.complete_timestamp_date"]'],
'low'  : ['["daily_supply_v2.sum_booked_shifts", "daily_supply_v2.date_date"]',
'["hitch_stack.date_date", "hitch_stack.hitch_count"]']}
| History Connection Name   | Query Fields Used                                                                                                                                                           |
|---------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| low                       | ["daily_supply_v2.sum_booked_shifts", "daily_supply_v2.date_date"] ["hitch_stack.date_date", "hitch_stack.hitch_count"]                                                     |
| sandy                     | ["shifts.date_date", "hifts.sum_booked_shifts"] ["daily_supply_v2.date_date", "daily_supply_v2.sum_booked_shifts"] ["orders.count_loads", "orders.complete_timestamp_date"] |

最新更新