递归函数在json中找到正确的匹配



我有一个web抓取应用程序,它循环遍历search_id,它会不时发现一个具有不同关键字字段的重复搜索,该字段称为tree_id。我正在努力弄清楚如何使用递归函数找到正确的匹配。在大多数情况下,json中会有两到三个tree_id,它需要能够从不同格式的搜索中选择正确的匹配项。

下面是一些带有我的评论的示例代码,它将突出问题:

#original json from the web scraping application for a single example
json = {'status': 'multiple', 
'searchResult': None, 
'spellingResult': None, 
'relatedTree': {'paths': [
{'treeid': 'C0.A.01', 'path': 'ALIMENTARY TRACT AND METABOLISM|STOMATOLOGICAL PREPARATIONS'}, 
{'treeid': 'C0.A.01.A', 'path': 'ALIMENTARY TRACT AND METABOLISM|STOMATOLOGICAL PREPARATIONS|STOMATOLOGICAL PREPARATIONS'}
]}, 
'tableResult': None, 'synResult': None}
trees = json['relatedTree']['paths']#.replace(".","")  this will cause an error because you can't use replace in a list
tree_id0 = json['relatedTree']['paths'][0]['treeid'].replace(".","")  #replaces the string in treeid index position 0 to remove all periods.
print(tree_id0)  
tree_id1 = json['relatedTree']['paths'][1]['treeid'].replace(".","") #replaces the string in treeid index position 1 to remove all periods.
print(tree_id1)

search = 'A01A'        # example would be to search 'A01A' and then also 'A01' and have it pick the correct substring
search1 = 'A01'
if tree_id0.find(search) != -1:  # correct while using 'A01' and works with 'A01A'.  
print("Found!")
else:
print("Not found!")
if tree_id1.find(search) != -1:  # incorrect while using 'A01' but works with 'A01A'.  I need it to find the exact string and nothing to the right of the last letter of search
print("Found!")
else:
print("Not found!")

# my attempt at a recursive function to solve the problem, but I get sting indices must be integers and in it's current form I'm not sure if i'm going about the problem the wrong way.   
def search_multi(trees: list, search: str) -> dict:
for tree in trees:
if tree['treeid'].replace(".","") == search:
print(tree['treeid'].replace(".",""))
return tree
if tree['treeid'].replace(".",""):
response = search_multi(tree['treeid'].replace(".",""), search)
if response:
return response
searched_multis = search_multi(trees, search)
print(searched_multis)

我想要的结果是,如果搜索是"A01A",它会从json中选择tree_id C0.A.01.A;如果搜索是‘A01’,它会选择tree_idC0.A.01。

if,else语句将显示它应该如何工作,但它不会给出A01的正确结果,因为它看起来超过了最后一个字母。

这里有一种方法。这返回带有搜索"的dict;treeid":

def get_id(d, search):
if isinstance(d, dict):
for k,v in d.items():
if k == 'treeid' and ''.join(v.split('.')[1:]) == search:
yield d
else:
yield from get_id(v, search)
elif isinstance(d, list):
for i in d:
yield from get_id(i, search)

out = next(get_id(json, 'A01'))

输出:

{'treeid': 'C0.A.01',
'path': 'ALIMENTARY TRACT AND METABOLISM|STOMATOLOGICAL PREPARATIONS'}

最新更新