试图获得直接的孩子,但让所有的孩子都使用美丽汤



我正在尝试从此网址创建一个类别字典,尤其是食物。现在,当我尝试使用以下代码时,它给出了重复的li项。

from bs4 import BeautifulSoup as bs
import requests
url = 'https://developer.foursquare.com/docs/build-with-foursquare/categories/'
req = requests.get(url)
soup = bs(req.text)
food_categories = soup.select('div.documentTemplate__Content-sc-5mpekp-0 > ul > li:nth-child(4)')[0]
for tagli in food_categories.find_all("li"):
print(tagli.find('h3').text)
for another_tagli in tagli.find_all('ul'):
for some_tagli in another_tagli.find_all('li'):
print(some_tagli.find('h3').text)
for one_tagli in some_tagli.find_all('ul'):
for aon_tagli in one_tagli.find_all('li'):
print(aon_tagli.find('h3').text)

现在,根据许多 stackoverflow 帖子,我试图使用recursive=False参数来获得唯一的直接子项,但如果我使用它,我什么也得不到。

我正在寻找这样的输出:

{
'food': {
'Afghan Restaurant': [],
'African Restaurant': ['Ethiopian Restaurant'],
'Asian Restaurant': {
'Chinese Restaurant': ['Anhui Restaurant', 'Beijing Restaurant']
}
}
}

请引导我到这里。

此脚本从"食物"子类别生成树:

import requests
from bs4 import BeautifulSoup
def parse_tree(t):
dct = {}
for li in t.find_all('li', recursive=False):
dct[li.find_next('h3').text] = parse_tree(li.select_one('ul'))
return dct
url = 'https://developer.foursquare.com/docs/build-with-foursquare/categories/'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
root = soup.select_one('h3:contains("Food") ~ ul')
tree = parse_tree(root)
# pretty print the tree:
import json
print(json.dumps(tree, indent=4))

指纹:

{
"Afghan Restaurant": {},
"African Restaurant": {
"Ethiopian Restaurant": {}
},
"American Restaurant": {
"New American Restaurant": {}
},
"Asian Restaurant": {
"Burmese Restaurant": {},
"Cambodian Restaurant": {},
"Chinese Restaurant": {
"Anhui Restaurant": {},
"Beijing Restaurant": {},
"Cantonese Restaurant": {},
"Cha Chaan Teng": {},
"Chinese Aristocrat Restaurant": {},
... and so on.
from bs4 import BeautifulSoup as bs
import requests
url = 'https://developer.foursquare.com/docs/build-with-foursquare/categories/'
req = requests.get(url)
soup = bs(req.text)
food_categories = soup.select('div.documentTemplate__Content-sc-5mpekp-0 > ul > li:nth-child(4)')[0]
data={}
for tagli in food_categories.find_all(recursive=False):
data[tagli.find('h3').text]={}
for some_tagli in tagli.find('ul').find_all(recursive=False):
data[tagli.find('h3').text][some_tagli.find('h3').text]={}
for aon_tagli in some_tagli.find('ul').find_all(recursive=False):
data[tagli.find('h3').text][some_tagli.find('h3').text][aon_tagli.find('h3').text]=[]
for  taglis in aon_tagli.find('ul').find_all(recursive=False):
data[tagli.find('h3').text][some_tagli.find('h3').text][aon_tagli.find('h3').text].append(taglis.find('h3').text)

print(data)

最新更新