获取dict中任何未知内容的所有第一项



我有一个像这样的嵌套字典:

results={
0: {'id': 87535653, 'cc': 0, 'cover': 89, 'grid': 'VQ'},
1: {'id': 31213450, 'cc': 0, 'cover': 99, 'grid': 'VQ'},
2: {'id': 22343446, 'cc': 0.1, 'cover': 79, 'grid': 'VP'},
3: {'id': 34568756, 'cc': 0, 'cover': 34, 'grid': 'VQ'},
4: {'id': 43532251, 'cc': 0.2, 'cover': 78, 'grid': 'DS'},
5: {'id': 42532376, 'cc': 23, 'cover': 90, 'grid': 'ZF'},
}

我想要的是只获得grid中任何内容的第一项。因此,对于这个例子,我想要返回这个:

results={
0: {'id': 87535653, 'cc': 0, 'cover': 89, 'grid': 'VQ'},
2: {'id': 22343446, 'cc': 0.1, 'cover': 79, 'grid': 'VP'},
4: {'id': 43532251, 'cc': 0.2, 'cover': 78, 'grid': 'DS'},
5: {'id': 42532376, 'cc': 23, 'cover': 90, 'grid': 'ZF'},
}

挑战:我以前不知道Grid中的内容。也就是说,任何东西都可以在那里,我不能根据特定的内容进行迭代。代码必须独立地识别这是一个不存在于此内容中的项。

我如何遍历文件以获得我想要的结果?

你可以遍历字典并标记你找到的每个网格,这样你下次找到它时就不会把它添加到最终的字典

In [1]: results={
...:     0: {'id': 87535653, 'cc': 0, 'cover': 89, 'grid': 'VQ'},
...:     1: {'id': 31213450, 'cc': 0, 'cover': 99, 'grid': 'VQ'},
...:     2: {'id': 22343446, 'cc': 0.1, 'cover': 79, 'grid': 'VP'},
...:     3: {'id': 34568756, 'cc': 0, 'cover': 34, 'grid': 'VQ'},
...:     4: {'id': 43532251, 'cc': 0.2, 'cover': 78, 'grid': 'DS'},
...:     5: {'id': 42532376, 'cc': 23, 'cover': 90, 'grid': 'ZF'},
...: }
In [2]: final_dict = {}
In [3]: _collected_grids = set()
In [4]: for key, value in results.items():
...:     if value['grid'] not in _collected_grids:
...:         final_dict[key] = value
...:         _collected_grids.add(value['grid'])
...:
In [5]: final_dict
Out[5]:
{0: {'id': 87535653, 'cc': 0, 'cover': 89, 'grid': 'VQ'},
2: {'id': 22343446, 'cc': 0.1, 'cover': 79, 'grid': 'VP'},
4: {'id': 43532251, 'cc': 0.2, 'cover': 78, 'grid': 'DS'},
5: {'id': 42532376, 'cc': 23, 'cover': 90, 'grid': 'ZF'}}

你可以用Pandas:

import pandas as pd    
df=pd.DataFrame.from_dict(results, orient='index')
df=df.drop_duplicates('grid')
res = df.to_dict(orient='index')
>>>print(res)
{0: {'id': 87535653, 'cc': 0.0, 'cover': 89, 'grid': 'VQ'}, 
2: {'id': 22343446, 'cc': 0.1, 'cover': 79, 'grid': 'VP'}, 
4: {'id': 43532251, 'cc': 0.2, 'cover': 78, 'grid': 'DS'}, 
5: {'id': 42532376, 'cc': 23.0, 'cover': 90, 'grid': 'ZF'}}

如下所示(已扫描的网格数据)

results = {
0: {'id': 87535653, 'cc': 0, 'cover': 89, 'grid': 'VQ'},
1: {'id': 31213450, 'cc': 0, 'cover': 99, 'grid': 'VQ'},
2: {'id': 22343446, 'cc': 0.1, 'cover': 79, 'grid': 'VP'},
3: {'id': 34568756, 'cc': 0, 'cover': 34, 'grid': 'VQ'},
4: {'id': 43532251, 'cc': 0.2, 'cover': 78, 'grid': 'DS'},
5: {'id': 42532376, 'cc': 23, 'cover': 90, 'grid': 'ZF'},
}
grids = set()
data = dict()
for k, v in results.items():
if v['grid'] not in grids:
data[k] = v
grids.add(v['grid'])
for k, v in data.items():
print(f'{k} {v}')

输出
0 {'id': 87535653, 'cc': 0, 'cover': 89, 'grid': 'VQ'}
2 {'id': 22343446, 'cc': 0.1, 'cover': 79, 'grid': 'VP'}
4 {'id': 43532251, 'cc': 0.2, 'cover': 78, 'grid': 'DS'}
5 {'id': 42532376, 'cc': 23, 'cover': 90, 'grid': 'ZF'}

最新更新