删除所有嵌套的 JSON 数组,但日期最短的数组除外



我有以下JSON字典。 我想做的是删除所有"orbiting_body"不是"地球"的"close_approach_data"对象。 问题是可能有不止一个物体具有orbiting_body:"地球",在所有这些物体之间,我试图保持"approach_date"最小的一个。

data = [
{
"id": "01",
"close_approach_data": [
{
"orbiting_body": "Earth",
"approach_date": "1945-06-07"
},
{
"orbiting_body": "Earth",
"approach_date": "1975-06-07"
},
{
"orbiting_body": "Mars",
"approach_date": "1935-06-07"
}
]
},
{
"id": "02",
"close_approach_data": [
{
"orbiting_body": "Earth",
"approach_date": "1945-06-07"
},
{
"orbiting_body": "Earth",
"approach_date": "1975-06-07"
},
{
"orbiting_body": "Mars",
"approach_date": "1935-06-07"
}
]
}
]

我想得到这个:

data = [
{
"id": "01",
"close_approach_data": {
"orbiting_body": "Mars",
"approach_date": "1935-06-07"
}
},
{
"id": "02",
"close_approach_data": {
"orbiting_body": "Mars",
"approach_date": "1935-06-07"
}
}
]

所以我正在尝试提出一些代码:

earthObjs =[]
for element in data:
for subel in element["close_approach_data"]:
if ([subel][0]["orbiting_body"]=="Earth"):
#then i would have to store the objects
earthObjs.append([subel])
#here i am trying to find the object with the min 'approach_date'
minEarth = min(dt.strptime(earthObjs["close_approach_date"],"%Y-%m-%d"))
#then i would have to somehow place this as the only element of close_approach_data
element["close_approach_data"] = json.loads(minEarth)
#and clear the earthObjs list so it can be used for the next element
earthObjs.clear()

我很清楚我的一半代码不起作用。我想我可能终于接近让它工作了,我真的需要一些帮助。具体来说,我知道我在搜索最小值时做错了什么,因为我无法访问对象的'close_approach_data'字段。 另外,我也不确定json.load的路线。

下面是您描述的处理到代码中的相当直接的翻译:

from datetime import datetime
import json
for dataset in data:
earliest, initial = datetime.max, {}
# Find the non-Earth body with the earliest approach date.
for close_approach in dataset["close_approach_data"]:
if close_approach["orbiting_body"] != "Earth":
dt = datetime.strptime(close_approach["approach_date"],
"%Y-%m-%d")
if dt < earliest:
dt, initial = earliest, close_approach
# Replace entire close_approach_data list with a single object
# comprised of the non-Earth item with the earliest date (or an
# empty dictionary if there weren't any).
dataset["close_approach_data"] = initial
print(json.dumps(data, indent=4))

输出:

[
{
"id": "01",
"close_approach_data": {
"orbiting_body": "Mars",
"approach_date": "1935-06-07"
}
},
{
"id": "02",
"close_approach_data": {
"orbiting_body": "Mars",
"approach_date": "1935-06-07"
}
}
]

这是实现算法的一种方法:

res = []
for d in data:
res.append({**{'id': d['id'], **{'close_approch_data': 
next((iter(sorted((e for e in d['close_approach_data'] 
if e['orbiting_body'] != 'Earth'), 
key=lambda x: x['approach_date']))), None)}}})
print(res)
[{'close_approch_data': {'approach_date': '1935-06-07',
'orbiting_body': 'Mars'},
'id': '01'},
{'close_approch_data': {'approach_date': '1935-06-07',
'orbiting_body': 'Mars'},
'id': '02'}]

解释

乍一看(和第二(,这看起来一团糟。但基本部分是:

  • 迭代字典列表。
  • 对于每个id,将一个项目追加到列表res
  • 仅通过生成器表达式中的if子句包含非地球数据。
  • 按接近日期排序;您可以继续使用datetime,但鉴于当前格式,这不是必需的。
  • 如果第一个元素存在,则通过next(iter(...))提取它。如果不存在任何元素,则返回{'close_approach_data': None}

最新更新