嵌套字典:提取叶的路径



我有一个Python嵌套字典,如下所示:

{'dist_river': 
  {'high': 
    {'wind_speed': 
      {'1': 
        {'population': 
          {'high': 
            {'school': 
              {'high':'T', 'medium':'T', 'low':'F'}
            }, 
            'medium': 
              {'land_cover': 
                {'Mix_garden': 
                  {'income_source': 
                    {'Plantation':'T', 'Agriculture':'F'}
                  }
                }
              }
            }
          }
        }
      },
    'low': 'F'
  }
}

如何从嵌套词典中获取子词典?。例如,dic:中的子分区

results = [
    {'dist_river': 
     {'high': 
      {'wind_speed': 
       {'1': 
        {'population': 
         {'high': 
          {'school': 
           {'high': 'T', 'medium': 'T', 'low': 'F'}
          }}}}}}},
    {'dist_river': 
     {'high': 
      {'wind_speed': 
       {'1': 
        {'population': 
         {'medium': 
          {'land_cover': 
           {'Mix_garden': 
            {'income_source': 
             {'Plantation': 'T', 'Agriculture': 'F'}
            }}}}}}}}},
    {'dist_river': 
     {'low': 'F'}
    }
]
lengths(results) == 3

感谢您的帮助

社区编辑:似乎每个生成的字典对于每个嵌套级别都只能有一个条目。换句话说,每个结果都包含字典树中每个叶子的整个路径Tim Pietzcker 13小时前

import collections
def isDict(d):
    return isinstance(d, collections.Mapping)
def isAtomOrFlat(d):
    return not isDict(d) or not any(isDict(v) for v in d.values())
def leafPaths(nestedDicts, noDeeper=isAtomOrFlat):
    """
        For each leaf in NESTEDDICTS, this yields a 
        dictionary consisting of only the entries between the root
        and the leaf.
    """
    for key,value in nestedDicts.items():
        if noDeeper(value):
            yield {key: value}
        else:
            for subpath in leafPaths(value):
                yield {key: subpath}

演示:

>>> pprint.pprint(list( leafPaths(dic) ))
[{'dist_river': {'high': {'wind_speed': {'1': {'population': {'high': {'school': {'high': 'T',
                                                                                  'low': 'F',
                                                                                  'medium': 'T'}}}}}}}},
 {'dist_river': {'high': {'wind_speed': {'1': {'population': {'medium': {'land_cover': {'Mix_garden': {'income_source': {'Agriculture': 'F',
                                                                                                                         'Plantation': 'T'}}}}}}}}}},
 {'dist_river': {'low': 'F'}}]

旁注1:然而,除非出于某种原因需要这种格式,否则我个人认为最好以元组的方式生成节点,例如:

...noDeeper=lambda x:not isDict(x)...
...yield tuple(value)
...yield (key,)+subpath
[('dist_river', 'high', 'wind_speed', '1', 'population', 'high', 'school', 'high', 'T'),
 ('dist_river', 'high', 'wind_speed', '1', 'population', 'high', 'school', 'medium', 'T'),
 ('dist_river', 'high', 'wind_speed', '1', 'population', 'high', 'school', 'low', 'F'),
 ('dist_river', 'high', 'wind_speed', '1', 'population', 'medium', 'land_cover', 'Mix_garden', 'income_source', 'Plantation', 'T'),
 ('dist_river', 'high', 'wind_speed', '1', 'population', 'medium', 'land_cover', 'Mix_garden', 'income_source', 'Agriculture', 'F'),
 ('dist_river', 'low', 'F')]

(很容易从"直截了当"的答案中提取,这恰好是第435条的答案。)


旁注2:请注意,OP并不是在寻找天真的实现。天真的实现将具有noDeeper=lambda x:not isDict(x),结果为:

>>> pprint.pprint(list( leafPaths(dic) ))
[{'dist_river': {'high': {'wind_speed': {'1': {'population': {'high': {'school': {'high': 'T'}}}}}}}},
 {'dist_river': {'high': {'wind_speed': {'1': {'population': {'high': {'school': {'medium': 'T'}}}}}}}},
 {'dist_river': {'high': {'wind_speed': {'1': {'population': {'high': {'school': {'low': 'F'}}}}}}}},
 {'dist_river': {'high': {'wind_speed': {'1': {'population': {'medium': {'land_cover': {'Mix_garden': {'income_source': {'Plantation': 'T'}}}}}}}}}},
 {'dist_river': {'high': {'wind_speed': {'1': {'population': {'medium': {'land_cover': {'Mix_garden': {'income_source': {'Agriculture': 'F'}}}}}}}}}},
 {'dist_river': {'low': 'F'}}]

编辑:这是一个低效的算法。每片叶片L被重产CCD_ 2次。更有效的方法是使用自定义数据结构来链接生成器,或者手动模拟堆栈。

也许这个:

def enum_paths(p):
    if not hasattr(p, 'items'):
        yield p
    else:
        for k, v in p.items():
            for x in enum_paths(v):
                yield {k: x}

for x in enum_paths(dic):
    print x

这与从字典中"获取"任何其他的方式完全相同。

print dic1['dist_river']['high']

等等

编辑:

如果我误解了这个问题,并且它实际上是关于一次获得所有dict的列表,这里有一个例子,每个dict中只有一个密钥:

def get_nested_dicts(d):
    dicts = []
    probe = d
    while type(probe) == dict:
        dicts.append(probe)
        probe = probe.values()[0]
    return dicts