我有一个web抓取应用程序,它循环遍历search_id,它会不时发现一个具有不同关键字字段的重复搜索,该字段称为tree_id。我正在努力弄清楚如何使用递归函数找到正确的匹配。在大多数情况下,json中会有两到三个tree_id,它需要能够从不同格式的搜索中选择正确的匹配项。
下面是一些带有我的评论的示例代码,它将突出问题:
#original json from the web scraping application for a single example
json = {'status': 'multiple',
'searchResult': None,
'spellingResult': None,
'relatedTree': {'paths': [
{'treeid': 'C0.A.01', 'path': 'ALIMENTARY TRACT AND METABOLISM|STOMATOLOGICAL PREPARATIONS'},
{'treeid': 'C0.A.01.A', 'path': 'ALIMENTARY TRACT AND METABOLISM|STOMATOLOGICAL PREPARATIONS|STOMATOLOGICAL PREPARATIONS'}
]},
'tableResult': None, 'synResult': None}
trees = json['relatedTree']['paths']#.replace(".","") this will cause an error because you can't use replace in a list
tree_id0 = json['relatedTree']['paths'][0]['treeid'].replace(".","") #replaces the string in treeid index position 0 to remove all periods.
print(tree_id0)
tree_id1 = json['relatedTree']['paths'][1]['treeid'].replace(".","") #replaces the string in treeid index position 1 to remove all periods.
print(tree_id1)
search = 'A01A' # example would be to search 'A01A' and then also 'A01' and have it pick the correct substring
search1 = 'A01'
if tree_id0.find(search) != -1: # correct while using 'A01' and works with 'A01A'.
print("Found!")
else:
print("Not found!")
if tree_id1.find(search) != -1: # incorrect while using 'A01' but works with 'A01A'. I need it to find the exact string and nothing to the right of the last letter of search
print("Found!")
else:
print("Not found!")
# my attempt at a recursive function to solve the problem, but I get sting indices must be integers and in it's current form I'm not sure if i'm going about the problem the wrong way.
def search_multi(trees: list, search: str) -> dict:
for tree in trees:
if tree['treeid'].replace(".","") == search:
print(tree['treeid'].replace(".",""))
return tree
if tree['treeid'].replace(".",""):
response = search_multi(tree['treeid'].replace(".",""), search)
if response:
return response
searched_multis = search_multi(trees, search)
print(searched_multis)
我想要的结果是,如果搜索是"A01A",它会从json中选择tree_id C0.A.01.A;如果搜索是‘A01’,它会选择tree_idC0.A.01。
if,else语句将显示它应该如何工作,但它不会给出A01的正确结果,因为它看起来超过了最后一个字母。
这里有一种方法。这返回带有搜索"的dict;treeid":
def get_id(d, search):
if isinstance(d, dict):
for k,v in d.items():
if k == 'treeid' and ''.join(v.split('.')[1:]) == search:
yield d
else:
yield from get_id(v, search)
elif isinstance(d, list):
for i in d:
yield from get_id(i, search)
out = next(get_id(json, 'A01'))
输出:
{'treeid': 'C0.A.01',
'path': 'ALIMENTARY TRACT AND METABOLISM|STOMATOLOGICAL PREPARATIONS'}