如何从多个重复的json文件中删除文件之间有微小变化的文本块



我有一个json文件,它有重复的部分,我正试图编写一个脚本,从多个文件中删除特定的文本块。Python脚本将是我最喜欢的,否则从我的搜索中,sed也可以工作,尽管我对此一无所知。以下是我的json文件的格式示例:

{
"Animal": {
"Type_species": "Reptile"
},
"FindMe": "https://www.merriam-webster.com/dictionary/amphibian",
"Description": "Most are cold blooded."
},
{
"Animal": {
"Type_species": "Mammal"
},
"FindMe": "https://kids.nationalgeographic.com/animals/mammals/",
"Description": "There Are Approximately 5,000 Mammal Species."
},
{
"Animal": {
"Type_species": "Amphibian"
},
"FindMe": "https://en.wikipedia.org/wiki/Amphibian",
"Description": "Most amphibians have thin, moist skin that helps them to breathe"
},
  1. 如何从json文件中删除以下内容
{
"Animal": {
"Type_species": "Mammal"
},
"FindMe": "https://kids.nationalgeographic.com/animals/mammals/",
"Description": "There Are Approximately 5,000 Mammal Species."
},

我的另一个问题是,2.我如何调整脚本以适应不同的";FindMe";跨越多个文件的URL?例如,对于多个文件,第二个文件将具有以下内容,依此类推?

{
"Animal": {
"Type_species": "Mammal"
},
"FindMe": "https://kids.nationalgeographic.com/animals/mammals/facts/arctic-fox",
"Description": "There Are Approximately 5,000 Mammal Species."
},

我认为使用正则表达式会有所帮助,但我很难理解它们并在脚本中实现它们。

感谢您的帮助。

更新:我希望最终结果看起来像这样:

{
"Animal": {
"Type_species": "Reptile"
},
"FindMe": "https://www.merriam-webster.com/dictionary/amphibian",
"Description": "Most are cold blooded."
},
{
"Animal": {
"Type_species": "Amphibian"
},
"FindMe": "https://en.wikipedia.org/wiki/Amphibian",
"Description": "Most amphibians have thin, moist skin that helps them to breathe"
},

假设您的完整JSON包含一个字典列表(您的示例建议(,那么:

JSON = {"data": [{
"Animal": {
"Type_species": "Reptile"
},
"FindMe": "https://www.merriam-webster.com/dictionary/amphibian",
"Description": "Most are cold blooded."
},
{
"Animal": {
"Type_species": "Mammal"
},
"FindMe": "https://kids.nationalgeographic.com/animals/mammals/",
"Description": "There Are Approximately 5,000 Mammal Species."
},
{
"Animal": {
"Type_species": "Amphibian"
},
"FindMe": "https://en.wikipedia.org/wiki/Amphibian",
"Description": "Most amphibians have thin, moist skin that helps them to breathe"
}]}
JSON['data'] = [d for d in JSON['data'] if d['Animal']['Type_species'] != 'Mammal']
print(JSON)

这可能对你有用(GNU sed(:

sed '/^s*{/{:a;N;/^(s*){.*n1},/!ba;/"Type_species": "Mammal"/d}' file

收集每只动物的详细信息,如果其中包含"Type_species": "Mammal",则删除该动物。

最新更新