如何在进行网络抓取时从动态呈现的网页中获取更多项目



我正在使用python从Foodpanda中抓取餐厅名称。页面的项目都是通过它们的<script>呈现的,所以我无法通过它们的html css 获得任何数据

foodpanda_url = "https://www.foodpanda.hk/restaurants/new?lat=22.33523782&lng=114.18249102&expedition=pickup&vertical=restaurants"
# send a request to the page, using the Mozilla 5.0 browser header
req = Request(foodpanda_url, headers={'User-Agent' : 'Mozilla/5.0'})
# open the page using our urlopen library
page = urlopen(req)
soup = BeautifulSoup(page.read(), "html.parser")
print(soup.prettify())
str_soup = str(soup.prettify())

我使用以下内容解析str_soup中的vendor字符串:

fp_vendors = list()
vendorlst = str_soup.split(""discoMeta":{"reco_config":{"flags":[]},"traces":[]},"items":")
opensqr = 0
startobj = 0
for i in range(len(vendorlst)):
if i==0:
continue
else:
for cnt in range(len(vendorlst[i])):
if (vendorlst[i][cnt] == '['):
opensqr += 1
elif (vendorlst[i][cnt] == ']'):
opensqr -= 1
if opensqr == 0:
vendorsStr = vendorlst[i][1:cnt]
opencurly = 0
for x in range(len(vendorsStr)):
if vendorsStr[x] == ',':
continue
if (vendorsStr[x] == '{'):
opencurly += 1
elif (vendorsStr[x] == '}'):
opencurly -= 1
if opencurly == 0:
vendor = vendorsStr[startobj:x+1]
if (vendor not in fp_vendors) and vendor != "":
fp_vendors.append(vendor)
startobj = x+2 #continue to next {
continue
break
for item in fp_vendors:
#     print(item+"n")
itemstr = re.split(""minimum_pickup_time":[0-9]+,"name":"", item)[1]
itemstr = itemstr.split("",")[0]
print(itemstr+"n")
print(len(fp_vendors))

然而,这只会返回一小部分餐厅,比如大约50家。我如何才能将代码设置为";得到";Foodpanda的更多餐厅商品?我如何模拟";向下滚动";以便加载更多的项目,以便我可以获得更多的餐厅项目?

使用浏览器开发工具您可以轻松监控所有请求。对于您的特殊情况,我发现了这个api调用:

https://disco.deliveryhero.io/listing/api/v1/pandora/vendors?latitude=22.33523782&经度=114.18239102&language_id=1&include=特征&dynamic_ pricing=0&configuration=Variant1&country=hk&customer_id=&customer_hash=&预算=&美食=&排序=&food_characteristic=&use_free_delivery_label=false&opening_type=拾取&vertical=餐馆&limit=48&偏移=48&customer_type=常规

以下是您问题的完整解决方案:

import json
import requests
items_list = []
url = "https://disco.deliveryhero.io/listing/api/v1/pandora/vendors?latitude=22.33523782&longitude=114.18249102&language_id=1&include=characteristics&dynamic_pricing=0&configuration=Variant1&country=hk&customer_id=&customer_hash=&budgets=&cuisine=&sort=&food_characteristic=&use_free_delivery_label=false&opening_type=pickup&vertical=restaurants&limit=48&offset={}&customer_type=regular"
for i in range(5):
resp = requests.get(
url.format(i * 48),
headers={
"x-disco-client-id": "web",
},
)
if resp.status_code == 200:
items_list += json.loads(resp.text)["data"]["items"]
print(f"Finished page: {i}")
print(items_list)

最新更新