我有一个像这样的嵌套字典:
results={
0: {'id': 87535653, 'cc': 0, 'cover': 89, 'grid': 'VQ'},
1: {'id': 31213450, 'cc': 0, 'cover': 99, 'grid': 'VQ'},
2: {'id': 22343446, 'cc': 0.1, 'cover': 79, 'grid': 'VP'},
3: {'id': 34568756, 'cc': 0, 'cover': 34, 'grid': 'VQ'},
4: {'id': 43532251, 'cc': 0.2, 'cover': 78, 'grid': 'DS'},
5: {'id': 42532376, 'cc': 23, 'cover': 90, 'grid': 'ZF'},
}
我想要的是只获得grid
中任何内容的第一项。因此,对于这个例子,我想要返回这个:
results={
0: {'id': 87535653, 'cc': 0, 'cover': 89, 'grid': 'VQ'},
2: {'id': 22343446, 'cc': 0.1, 'cover': 79, 'grid': 'VP'},
4: {'id': 43532251, 'cc': 0.2, 'cover': 78, 'grid': 'DS'},
5: {'id': 42532376, 'cc': 23, 'cover': 90, 'grid': 'ZF'},
}
挑战:我以前不知道Grid中的内容。也就是说,任何东西都可以在那里,我不能根据特定的内容进行迭代。代码必须独立地识别这是一个不存在于此内容中的项。
我如何遍历文件以获得我想要的结果?
你可以遍历字典并标记你找到的每个网格,这样你下次找到它时就不会把它添加到最终的字典
In [1]: results={
...: 0: {'id': 87535653, 'cc': 0, 'cover': 89, 'grid': 'VQ'},
...: 1: {'id': 31213450, 'cc': 0, 'cover': 99, 'grid': 'VQ'},
...: 2: {'id': 22343446, 'cc': 0.1, 'cover': 79, 'grid': 'VP'},
...: 3: {'id': 34568756, 'cc': 0, 'cover': 34, 'grid': 'VQ'},
...: 4: {'id': 43532251, 'cc': 0.2, 'cover': 78, 'grid': 'DS'},
...: 5: {'id': 42532376, 'cc': 23, 'cover': 90, 'grid': 'ZF'},
...: }
In [2]: final_dict = {}
In [3]: _collected_grids = set()
In [4]: for key, value in results.items():
...: if value['grid'] not in _collected_grids:
...: final_dict[key] = value
...: _collected_grids.add(value['grid'])
...:
In [5]: final_dict
Out[5]:
{0: {'id': 87535653, 'cc': 0, 'cover': 89, 'grid': 'VQ'},
2: {'id': 22343446, 'cc': 0.1, 'cover': 79, 'grid': 'VP'},
4: {'id': 43532251, 'cc': 0.2, 'cover': 78, 'grid': 'DS'},
5: {'id': 42532376, 'cc': 23, 'cover': 90, 'grid': 'ZF'}}
你可以用Pandas:
import pandas as pd
df=pd.DataFrame.from_dict(results, orient='index')
df=df.drop_duplicates('grid')
res = df.to_dict(orient='index')
>>>print(res)
{0: {'id': 87535653, 'cc': 0.0, 'cover': 89, 'grid': 'VQ'},
2: {'id': 22343446, 'cc': 0.1, 'cover': 79, 'grid': 'VP'},
4: {'id': 43532251, 'cc': 0.2, 'cover': 78, 'grid': 'DS'},
5: {'id': 42532376, 'cc': 23.0, 'cover': 90, 'grid': 'ZF'}}
如下所示(已扫描的网格数据)
results = {
0: {'id': 87535653, 'cc': 0, 'cover': 89, 'grid': 'VQ'},
1: {'id': 31213450, 'cc': 0, 'cover': 99, 'grid': 'VQ'},
2: {'id': 22343446, 'cc': 0.1, 'cover': 79, 'grid': 'VP'},
3: {'id': 34568756, 'cc': 0, 'cover': 34, 'grid': 'VQ'},
4: {'id': 43532251, 'cc': 0.2, 'cover': 78, 'grid': 'DS'},
5: {'id': 42532376, 'cc': 23, 'cover': 90, 'grid': 'ZF'},
}
grids = set()
data = dict()
for k, v in results.items():
if v['grid'] not in grids:
data[k] = v
grids.add(v['grid'])
for k, v in data.items():
print(f'{k} {v}')
输出0 {'id': 87535653, 'cc': 0, 'cover': 89, 'grid': 'VQ'}
2 {'id': 22343446, 'cc': 0.1, 'cover': 79, 'grid': 'VP'}
4 {'id': 43532251, 'cc': 0.2, 'cover': 78, 'grid': 'DS'}
5 {'id': 42532376, 'cc': 23, 'cover': 90, 'grid': 'ZF'}