我正在比较包含多达100,000个结果的服务器文件的字典。
我从服务器中捕获了一个列表文件,并且我已经将它们读为程序中的字典。钥匙值是MD5哈希,V值是路径(即/usr/john/upstart.exe(。
我的字典命名为 firstServ
和 secondServ
。
我需要找出:
- 键和值完全独有的键和secondserv。
- k在FirstServ中与k secondServ不同,但相关值相同
- 最后,键相同但值不同。(假设这不应该发生,但是它将验证数据的干净程度。
基本上,我只需要知道如何进行这些比较。谢谢您的任何输入。
一个非常简单的想法是以下;
FirstSet = {"1":"C:/", "2":"C:/Windows", "3":"C:/Users","4":"C:/Something"}
SecondSet = {"10":"E:/", "20":"C:/", "30":"C:/Users"}
Differences = []
for i in FirstSet.keys():
if(FirstSet[i] not in SecondSet.values()):
Differences.append((FirstSet[i],"FirstSet"))
for i in SecondSet.keys():
if(SecondSet[i] not in FirstSet.values()):
Differences.append((SecondSet[i],"SecondSet"))
for i in Differences:
print("Only set {} has the {} element.".format(i[1],i[0]))
将有101种方法。
也许加载您的钥匙,值对成对:
x = { k: [v1,v2] }
然后,您将获得FirstServ和Secustserv的数据,由Hash分组。只需循环浏览字典并找到不同的地方即可。
使用列表理解和简单的循环您可以做到这一点:
给定这些字典:
firstServ = {"md5Hash1":"path1", "md5Hash2":"path2", "md5Hash3":"path3"}
secondServ = {"md5Hash1":"path4", "md5Hash4":"path2", "md5Hash5":"path5"}
提取键:
firstServKeys = set(firstServ.keys())
secondServKeys = set(secondServ.keys())
firstServ独特的键
从firstServkeys提取秒数
uniqueKeysInFirstServ = firstServKeys.difference(secondServKeys)
secondServ独特的键
从SecondServkeys提取第一服务
uniqueKeysInSecondServ = secondServKeys.difference(firstServKeys)
两个字典中存在带有不同键的值
将两个地图从{hash:path}转换为{path:hash},然后对于Inv_festServ中的所有路径,如果inv_festserv也有它,请查找,如果它们的哈希(Hash(不同。
inv_festServ = {v: k for k, v in firstServ.items()} # Inverting keys and values
inv_secondServ = {v: k for k, v in secondServ.items()} # Inverting keys and values
valuesWithDifferentkeys = [v for v in list(inv_festServ.keys())
if v in inv_secondServ.keys() and inv_secondServ[v] != inv_festServ[v]]
两个字典中存在的键
相交的firstServkeys和SecondServkeys具有所有共同的钥匙,这样我们就可以在较小的集合上工作然后,对于每个键,请保持值不同
keysWithDifferentValues = [k for k in firstServKeys.intersection(secondServKeys)
if firstServ[k] != secondServ[k]]
打印出来:
print("Keys unique to first server:")
print(uniqueKeysInFirstServ)
print("Keys unique to second server:")
print(uniqueKeysInSecondServ)
print("Values present in both servers but with a different key:")
print(valuesWithDifferentkeys)
print("Keys present in both servers but with a different value:")
print(keysWithDifferentValues)
输出
Keys unique to first server:
{'md5Hash2', 'md5Hash3'}
Keys unique to second server:
{'md5Hash5', 'md5Hash4'}
Values present in both servers but with a different key:
['path2']
Keys present in both servers but with a different value:
['md5Hash1']