与Python的旧版本相比,如何在json文本中查找新元素



我每天都从网站上提取json,需要能够过滤掉前几天的所有旧条目,并将文本缩减为新条目。现在我有两个文本文件,prevJson和newJson。如何比较这两个文本文件,并且只返回newJson中的json而不返回prevJson中?

到目前为止,我的代码是:

import json
from urllib.request import urlopen
from jsondiff import diff
with open('prevJson', 'r') as file:
data = file.read().replace('n', '')
with open('newJson test', 'r') as file2:
data2 = file2.read().replace('n', '')
difference = diff(data, data2)
print(difference)

这是上一篇:

{'date': '2021-07-18', 'time': '4:00pm', 'title': 'Venues', 'data': {'monitor': [{'Venue': '99 Bikes Bondi Junction', 'Address': '228 Oxford Street', 'Suburb': 'Bondi Junction', 'Date': 'Saturday 10 July 2021', 'Time': '12:45pm to 2:45pm', 'Alert': 'Get tested immediately and self-isolate for 14 days.', 'Lon': '151.2429926', 'Lat': '-33.89015617', 'HealthAdviceHTML': "Anyone who attended this venue is a <a href='https://www.health.nsw.gov.au/Infectious/factsheets/Pages/advice-for-contacts.aspx'>close contact</a> and must immediately <a href='https://www.nsw.gov.au/covid-19/how-to-protect-yourself-and-others/clinics'>get tested</a> and <a href='https://www.nsw.gov.au/covid-19/what-you-can-and-cant-do-under-rules/self-isolation'>self-isolate</a> for 14 days regardless of the result, and call 1800 943 553 unless they have already been contacted by NSW Health.", 'Last updated date': 'Monday 12 July 2021'}

这是新的Json:

{'date': '2021-07-18', 'time': '4:00pm', 'title': 'Venues', 'data': {'monitor': [{'Venue': '99 Bikes Bondi Junction', 'Address': '228 Oxford Street', 'Suburb': 'Bondi Junction', 'Date': 'Saturday 10 July 2021', 'Time': '12:45pm to 2:45pm', 'Alert': 'Get tested immediately and self-isolate for 14 days.', 'Lon': '151.2429926', 'Lat': '-33.89015617', 'HealthAdviceHTML': "Anyone who attended this venue is a <a href='https://www.health.nsw.gov.au/Infectious/factsheets/Pages/advice-for-contacts.aspx'>close contact</a> and must immediately <a href='https://www.nsw.gov.au/covid-19/how-to-protect-yourself-and-others/clinics'>get tested</a> and <a href='https://www.nsw.gov.au/covid-19/what-you-can-and-cant-do-under-rules/self-isolation'>self-isolate</a> for 14 days regardless of the result, and call 1800 943 553 unless they have already been contacted by NSW Health.", 'Last updated date': 'Monday 12 July 2021'} hello

我期望该程序返回";你好";但它却返回了:

{'date': '2021-07-18', 'time': '4:00pm', 'title': 'Venues', 'data': {'monitor': [{'Venue': '99 Bikes Bondi Junction', 'Address': '228 Oxford Street', 'Suburb': 'Bondi Junction', 'Date': 'Saturday 10 July 2021', 'Time': '12:45pm to 2:45pm', 'Alert': 'Get tested immediately and self-isolate for 14 days.', 'Lon': '151.2429926', 'Lat': '-33.89015617', 'HealthAdviceHTML': "Anyone who attended this venue is a <a href='https://www.health.nsw.gov.au/Infectious/factsheets/Pages/advice-for-contacts.aspx'>close contact</a> and must immediately <a href='https://www.nsw.gov.au/covid-19/how-to-protect-yourself-and-others/clinics'>get tested</a> and <a href='https://www.nsw.gov.au/covid-19/what-you-can-and-cant-do-under-rules/self-isolation'>self-isolate</a> for 14 days regardless of the result, and call 1800 943 553 unless they have already been contacted by NSW Health.", 'Last updated date': 'Monday 12 July 2021'} hello

newJson不是有效的JSON。

我复制/粘贴了两个"JSON",但它们实际上都无效。prevJson在总共}]}}中缺少右括号。newJson也有同样的问题。此外,您使用了'(单引号(而不是"(双引号(。JSON需要"(双引号(。

关键字HealthAdviceHTML的值包括"(双引号(。这些需要通过在前面放置一个来逃脱。

接下来的事情是,您不是将JSON加载为JSON,而是将它们加载为字符串。这就是diff()返回全部内容的原因,因为字符串本身与data1data2不同。这也是为什么它没有为格式不正确的JSON抛出错误的原因。

此外,要向JSON添加内容,您需要添加键/值对,而不仅仅是一个未加引号的单词。这永远不会奏效。查看我是如何更改newJson的。

话虽如此,以下是2个格式正确的JSON。您可以看到Stackoverflow格式是正确着色的语法。(使用``json(

下面是正确加载JSON文件的代码,因此可以使用diff()函数对它们进行比较。

prevJson

{
"date": "2021-07-18",
"time": "4:00pm",
"title": "Venues",
"data": {
"monitor": [
{
"Venue": "99 Bikes Bondi Junction",
"Address": "228 Oxford Street",
"Suburb": "Bondi Junction",
"Date": "Saturday 10 July 2021",
"Time": "12:45pm to 2:45pm",
"Alert": "Get tested immediately and self-isolate for 14 days.",
"Lon": "151.2429926",
"Lat": "-33.89015617",
"HealthAdviceHTML": "Anyone who attended this venue is a <a href="https: //www.health.nsw.gov.au/Infectious/factsheets/Pages/advice-for-contacts.aspx">close contact</a> and must immediately <a href="https://www.nsw.gov.au/covid-19/how-to-protect-yourself-and-others/clinics">get tested</a> and <a href="https://www.nsw.gov.au/covid-19/what-you-can-and-cant-do-under-rules/self-isolation">self-isolate</a> for 14 days regardless of the result, and call 1800 943 553 unless they have already been contacted by NSW Health.",
"Last updated date": "Monday 12 July 2021"
}
]
}
}

newJson

您可以看到我添加了值为Dhar_的密钥hello。JSON需要键/值对。。。

{
"date": "2021-07-18",
"time": "4:00pm",
"title": "Venues",
"data": {
"monitor": [
{
"Venue": "99 Bikes Bondi Junction",
"Address": "228 Oxford Street",
"Suburb": "Bondi Junction",
"Date": "Saturday 10 July 2021",
"Time": "12:45pm to 2:45pm",
"Alert": "Get tested immediately and self-isolate for 14 days.",
"Lon": "151.2429926",
"Lat": "-33.89015617",
"HealthAdviceHTML": "Anyone who attended this venue is a <a href="https: //www.health.nsw.gov.au/Infectious/factsheets/Pages/advice-for-contacts.aspx">close contact</a> and must immediately <a href="https://www.nsw.gov.au/covid-19/how-to-protect-yourself-and-others/clinics">get tested</a> and <a href="https://www.nsw.gov.au/covid-19/what-you-can-and-cant-do-under-rules/self-isolation">self-isolate</a> for 14 days regardless of the result, and call 1800 943 553 unless they have already been contacted by NSW Health.",
"Last updated date": "Monday 12 July 2021"
}
],
"hello": "Dhar_"
}
}

jsoncompare.py

from jsondiff import diff
import json

with open('prevJson.json') as infile:
notajson = infile.read().replace('n', '')
print(f'notajson type:n{type(notajson)}n')

with open('prevJson.json') as infile:
data1 = json.loads(infile.read().replace('n', ''))
with open('newJson.json') as infile:
data2 = json.loads(infile.read().replace('n', ''))
print(f'data1 type:n{type(data1)}n')

difference = diff(data1, data2)
print(difference)

输出

notajson type:
<class 'str'>
data1 type:
<class 'dict'>
{'data': {'hello': 'Dhar_'}}

最新更新