如何捕获仅在Python中引用或双引号不同的yaml文件条目之间的差异?



假设我有这两个YAML文件:

TYPE:
fields:
field1:
value: 0
"field2":
value: 1

TYPE:
fields:
field1:
value: 0
field2:
value: 1

,我有下面的脚本来比较两者:

import yaml
def open_file(path):
d = {};
with open(path, "r") as f:
try:
d = yaml.safe_load(f);
except yaml.YAMLError as exc:
print(exc);
exit(1);
return d
def compare_dictionaries(type1, type2) -> None:
print("Comparing...")
differences = {}
if type1 == type2:
print('types are equal')
return;
for k, v in type1['TYPE']['fields'].items():
print(k)
if k not in type2['TYPE']['fields'].keys():
temp_compare = {'TYPE': {'fields': {k: ''}}};
else:
temp_compare = type2.copy();
if v != temp_compare['TYPE']['fields'][k]:
print("diff found for ", k)
differences[k] = {'new': v, 'old': temp_compare['TYPE']['fields'][k]}
return differences;
if __name__=="__main__":
f1 = "./f1.yaml";
f2 = "./f2.yaml";
d1 = open_file(f1);
print("first file opened")
d2 = open_file(f2);
print("second file opened")
diff = compare_dictionaries(d1, d2);
print(diff)

该代码的输出是:

second file opened
Comparing...
types are equal
None

是有意义的。在一种情况下,键fields2有双引号,而在另一种情况下没有。当解析文件并将其转换为字典时,结果将键转换为字符串,因此在比较两个字典时不会捕获任何差异。

有办法捕捉这种差异吗?

没有办法用PyYAML做到这一点,我从您正在使用的import yaml中推断。不仅是过时的库(只支持2009年被部分取代的YAML 1.1规范);它也不能转储你的f1.yaml,即它不能往返它(加载+转储)而不丢失field2周围的双引号(或将它们添加到所有其他标量字符串中)

ruamel.yaml(免责声明:我是该包的作者)比PyYAML有许多改进,包括保留往返报价的选项。然而,要利用数据更容易,加载的数据"行为";像普通的(Python)类型一样,所以你的test:

k not in type2['TYPE']['fields'].keys()

为false,则必须比较这些键的(Python)类型(要么是str,要么是ruamel.yaml.scalarstring.DoubleQuotedScalarString)(除此之外,你应该省略.keys())。

import sys
import ruamel.yaml
from pathlib import Path

def open_file(path, yaml):
d = {};
with open(path, "r") as f:
try:
d = yaml.load(f);
except yaml.YAMLError as exc:
print(exc);
exit(1);
return d
def compare_dictionaries(type1, type2) -> None:
print("Comparing...")
differences = {}
type2_key_types = {k: type(k) for k in type2['TYPE']['fields']}
for k, v in type1['TYPE']['fields'].items():
if k not in type2_key_types or type(k) != type2_key_types[k]:
temp_compare = {'TYPE': {'fields': {k: ''}}};
else:
temp_compare = type2.copy();
if v != temp_compare['TYPE']['fields'][k]:
print("diff found for ", k)
differences[k] = {'new': dict(v), 'old': temp_compare['TYPE']['fields'][k]}
return differences

if __name__=="__main__":
yaml = ruamel.yaml.YAML()
yaml.preserve_quotes = True
# uncomment the following to inspect that quotes are preserved
# yaml.dump(yaml.load(Path('f1.yaml')), sys.stdout)
f1 = "./f1.yaml";
f2 = "./f2.yaml";
d1 = open_file(f1, yaml);
print("first file opened")
d2 = open_file(f2, yaml);
print("second file opened")
diff = compare_dictionaries(d1, d2);
print(diff)

给了:

first file opened
second file opened
Comparing...
diff found for  field2
{'field2': {'new': {'value': 1}, 'old': ''}}

最新更新