处理可选的 python 字典字段



我正在处理加载到Python字典中的JSON数据。其中很多都有可选字段,然后可能包含字典之类的东西。

dictionary1 = 
{"required": {"value1": "one", "value2": "two"},
"optional": {"value1": "one"}}
dictionary2 = 
{"required": {"value1": "one", "value2": "two"}}

如果我这样做,

dictionary1.get("required").get("value1")

显然,这是有效的,因为场"required"始终存在。

但是,当我在dictionary2上使用相同的行(获取可选字段)时,这将产生AttributeError

dictionary2.get("optional").get("value1")
AttributeError: 'NoneType' object has no attribute 'get'

这是有道理的,因为第一个.get()将返回None,而第二个.get()不能在 None 对象上调用.get()

如果缺少可选字段,我可以通过给出默认值来解决此问题,但是数据越复杂,这会很烦人,所以我称之为"幼稚的修复":

dictionary2.get("optional", {}).get("value1", " ")

因此,第一个.get()将返回一个空字典{},可以在其上调用第二个.get(),并且由于它显然不包含任何内容,因此它将返回空字符串,如第二个默认值所定义的那样。

这将不再产生错误,但我想知道是否有更好的解决方案——特别是对于更复杂的情况(value1包含数组或其他字典等......

我也可以通过 try 解决这个问题 - 除了AttributeError,但这也不是我的首选方法。

try:
value1 = dictionary2.get("optional").get("value1")
except AttributeError:
value1 = " "

我也不喜欢检查可选字段是否存在,这会产生垃圾代码行,例如

optional = dictionary2.get("optional")
if optional:
value1 = optional.get("value1")
else:
value1 = " "

这看起来非常非Pythonic...

我在想也许我只是链接.get()的方法首先是错误的?

在你的代码中:

try:
value1 = dictionary2.get("optional").get("value1")
except AttributeError:
value1 = " "

您可以使用括号和except KeyError

try:
value1 = dictionary2["optional"]["value1"]
except KeyError:
value1 = " "

如果这对调用方来说太详细,请添加一个帮助程序:

def get_or_default(d, *keys, default=None):
try:
for k in keys:
d = d[k]
except (KeyError, IndexError):
return default
return d
if __name__ == "__main__":
d = {"a": {"b": {"c": [41, 42]}}}
print(get_or_default(d, "a", "b", "c", 1)) # => 42
print(get_or_default(d, "a", "b", "d", default=43)) # => 43

你也可以子类字典并使用元组括号索引,如 NumPy 和 Pandas:

class DeepDict(dict):
def __init__(self, d, default=None):
self.d = d
self.default = default
def __getitem__(self, keys):
d = self.d
try:
for k in keys:
d = d[k]
except (KeyError, IndexError):
return self.default
return d
def __setitem__(self, keys, x):
d = self.d
for k in keys[:-1]:
d = d[k]
d[keys[-1]] = x
if __name__ == "__main__":
dd = DeepDict({"a": {"b": {"c": [42, 43]}}}, default="foo")
print(dd["a", "b", "c", 1]) # => 43
print(dd["a", "b", "c", 11]) # => "foo"
dd["a", "b", "c", 1] = "banana"
print(dd["a", "b", "c", 1]) # => "banana"

但是,如果这让其他开发人员感到困惑,并且您希望充实其他预期方法,如如何"完美"覆盖字典中所述,则可能会产生工程成本。(将此视为概念验证草图)。最好不要太聪明。

您可以使用toolz.dicttoolz.get_in()

from toolz.dicttoolz import get_in
dictionary1 = {"required": {"value1": "one", "value2": "two"}, "optional": {"value1": "one"}}
dictionary2 = {"required": {"value1": "one", "value2": "two"}}
get_in(("optional", "value1"), dictionary1)
# 'one'
get_in(("optional", "value1"), dictionary2)
# None

如果你不想安装整个库,你可以复制在 BSD 下许可的源代码:

import operator
from functools import reduce
def get_in(keys, coll, default=None, no_default=False):
try:
return reduce(operator.getitem, keys, coll)
except (KeyError, IndexError, TypeError):
if no_default:
raise
return default

既然你喜欢dictionary2["optional"]["value1"] if "optional" in dictionary2 else " "dictionary2.get("optional", {}).get("value1", " ")这样的单行词,我想也建议

getattr(dictionary2.get("optional"), "get", {}.get)("value1", " ")

通过使用getattr,这也解释了[并且将返回" "]dictionary2['optional']不是字典[而不是用其他两种方法提出AttributeErrorTypeError]。

如果包装为函数,它将类似于

# get_v2 = lambda d, k1, k2, vDef=None: getattr(d.get(k1), 'get', {}.get)(k2,vDef) ## OR
def get_v2(d, k1, k2, vDef=None): 
return getattr(d.get(k1), 'get', {}.get)(k2,vDef)
a = get_v2(dictionary1, 'optional', 'value1', vDef=' ') ## -->  a='one'
b = get_v2(dictionary2, 'optional', 'value1', vDef=' ') ## -->  b=' '

但是,如果您希望能够为任意数量的键调用它,则需要使用递归

def getVal(obj, k1, *keys, vDef=None):
nxtVal = getattr(obj, 'get', {}.get)(k1, vDef)
return getVal(nxtVal, *keys, vDef=vDef) if keys else nxtVal

循环

def getVal(obj, *keys, vDef=None):
for k in keys: obj = getattr(obj, 'get', {}.get)(k, vDef)
return obj

虽然,我认为按照某些人的建议使用try..except更有效。

def getVal(obj, k1, *keys, vDef=None):
try: return getVal(obj[k1], *keys, vDef=vDef) if keys else obj[k1]
except: return vDef

def getVal(obj, *keys, vDef=None):
try: 
for k in keys: obj = obj[k]
except: obj = vDef
return obj

你也可以编写一个函数,返回一个函数[有点像operator.itemgetter],可以像valGetter("optional", "value1")(dictionary2, " ")一样使用

def valGetter(k1, *keys):
if keys:
def rFunc(obj, vDef=None):
try: 
for k in (k1,)+(keys): obj = obj[k]
except: obj = vDef
return obj
else: 
def rFunc(obj, vDef=None):
try: return obj[k1]
except: return vDef 
return rFunc

但请注意,与其他方法相比,这可能会相当慢。

首先,您将" "称为空字符串。这是不正确的;""是空字符串。

其次,如果您正在检查成员资格,我认为首先没有理由使用get方法。我会选择如下所示的内容。

if "optional" in dictionary2:
value1 = dictionary2["optional"].get("value1")
else:
value1 = ""

另一种需要考虑的替代方法(因为您经常使用get方法)是切换到defaultdict类。例如

from collections import defaultdict
dictionary2 = {"required": {"value1": "one", "value2": "two"}}
ddic2 = defaultdict(dict,dictionary2)
value1 = ddic2["optional"].get("value1")

pythonic 的处理方式是使用try/except块 -

dictionary2 = {"required": {"value1": "one", "value2": "two"}}
try:
value1 = dictionary2["optional"]["value1"]
except (KeyError, AttributeError) as e:
value1 = ""

KeyError捕获丢失的键,AttributeError捕获具有list/str而不是dict对象的情况。


如果你不喜欢代码中的大量try/except,你可以考虑使用一个辅助函数——

def get_val(data, keys):
try:
for k in keys:
data = data[k]
return data
except (KeyError, AttributeError) as e:
return ""
dictionary2 = {"required": {"value1": "one", "value2": "two"}}
print(get_val(dictionary2, ("required", "value2")))
print(get_val(dictionary2, ("optional", "value1")))

输出-

two

我使用 reduce 在 Python 中实现类似 JavaScript 的可选链接

from functools import reduce

data_dictionary = {
'foo': {
'bar': {
'buzz': 'lightyear'
},
'baz': {
'asd': 2023,
'zxc': [
{'patrick': 'star'},
{'spongebob': 'squarepants'}
],
'qwe': ['john', 'sarah']
}
},
'hello': {
'world': 'hello world',
},
}

def optional_chaining_v1(dictionary={}, *property_list):
def reduce_callback(current_result, current_dictionary):
if current_result is None:
return dictionary.get(current_dictionary)
if type(current_result) != dict:
return None
return current_result.get(current_dictionary)
return reduce(reduce_callback, property_list, None)

# or in one line
optional_chaining_v1 = lambda dictionary={}, *property_list: reduce(lambda current_result, current_dictionary: dictionary.get(current_dictionary) if current_result is None else None if type(current_result) != dict else current_result.get(current_dictionary), property_list, None)
# usage
optional_chaining_v1_result1 = optional_chaining_v1(data_dictionary, 'foo', 'bar', 'baz')
print('optional_chaining_v1_result1:', optional_chaining_v1_result1)
optional_chaining_v1_result2 = optional_chaining_v1(data_dictionary, 'foo', 'bar', 'buzz')
print('optional_chaining_v1_result2:', optional_chaining_v1_result2)
# optional_chaining_v1_result1: None
# optional_chaining_v1_result2: lightyear

def optional_chaining_v2(dictionary={}, list_of_property_string_separated_by_dot=''):
property_list = list_of_property_string_separated_by_dot.split('.')
def reduce_callback(current_result, current_dictionary):
if current_result is None:
return dictionary.get(current_dictionary)
if type(current_result) != dict:
return None
return current_result.get(current_dictionary)
return reduce(reduce_callback, property_list, None)

# or in one line
optional_chaining_v2 = lambda dictionary={}, list_of_property_string_separated_by_dot='': reduce(lambda current_result, current_dictionary: dictionary.get(current_dictionary) if current_result is None else None if type(current_result) != dict else current_result.get(current_dictionary), list_of_property_string_separated_by_dot.split('.'), None)
# usage
optional_chaining_v2_result1 = optional_chaining_v2(data_dictionary, 'foo.bar.baz')
print('optional_chaining_v2_result1:', optional_chaining_v2_result1)
optional_chaining_v2_result2 = optional_chaining_v2(data_dictionary, 'foo.bar.buzz')
print('optional_chaining_v2_result2:', optional_chaining_v2_result2)
# optional_chaining_v2_result1: None
# optional_chaining_v2_result2: lightyear

最新更新