my for循环不遍历url列表,只执行第一个条目



我是一个初学者,在我的头撞到墙上之后,我正在寻求任何帮助。我想抓取一个url列表,但是我的for循环只返回列表中的第一个项目。

我有一个url列表,一个将json数据刮入字典的函数,将字典转换为数据框架并导出到csv。除了for循环之外,一切都在工作,因此只有列表中的第一个url被抓取:

url_list_str = ['https://www.foodpanda.ph/restaurant/vh2d/sicilian-roast-legaspi-village',
'https://www.foodpanda.ph/restaurant/ns76/tokyo-milk-cheese-factory-greenbelt-5',
'https://www.foodpanda.ph/restaurant/hksd/paul-greenbelt-5']
for url in url_list_str:
url = url_list_str[0]
response = req.get(url, headers = headers)
pause(5)
html = BeautifulSoup(response.content, 'html.parser')
data = foodpanda_data(html)
restaurant_name = data['Name']
df = pd.DataFrame([data])

foodpanda()是for循环上面的一个函数,它擦除json并将其转换为字典。这里有一个预览,因为它很长:

def foodpanda_data(html):
script_tag = html.find("script", {"data-testid": "restaurant-seo-schema"})
json_text = script_tag.string
json_dict = json.loads(json_text)

extracted_data = {}
keys_to_extract = ["Name", "streetAddress", "addressLocality", "postalCode", "latitude", "longitude", "url", "ratingValue", "ratingCount", "bestRating", "worstRating", "servesCuisine", "priceRange"]
for key in keys_to_extract:
if key.lower() == 'name':
extracted_data[key] = json_dict.get('name', '') #... etc.
return extracted_data

我也试着把for循环写成:

for u in range(len(url_list_str)):
url = url_list_str[u]

但这也不起作用。这里一定有什么很明显的东西我没有明白,谢谢你!

因为在每次迭代中,您都会从这里的列表中选择第一个URL (URL = url_listrongtr[0])。

url_list_str = ['https://www.foodpanda.ph/restaurant/vh2d/sicilian-roast- legaspi-village',
'https://www.foodpanda.ph/restaurant/ns76/tokyo-milk-cheese-factory-greenbelt-5',
'https://www.foodpanda.ph/restaurant/hksd/paul-greenbelt-5']
for url in url_list_str:
response = req.get(url, headers = headers)
pause(5)
html = BeautifulSoup(response.content, 'html.parser')
data = foodpanda_data(html)
restaurant_name = data['Name']
df = pd.DataFrame([data])

我猜,你是想做这样的事情

import json
import time
import requests
import pandas as pd
from bs4 import BeautifulSoup
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)
def foodpanda_data(html):
script_tag = html.find("script", {"data-testid": "restaurant-seo-schema"})
json_text = script_tag.string
json_dict = json.loads(json_text)
extracted_data = {
"name": json_dict['name'],
"streetAddress": json_dict['address']['streetAddress'],
"addressLocality": json_dict['address']['addressLocality'],
"postalCode": json_dict['address']['postalCode'],
"latitude": json_dict['geo']['latitude'],
"longitude": json_dict['geo']['longitude'],
"url": json_dict['url'],
"ratingValue": json_dict['aggregateRating']['ratingValue'],
"ratingCount": json_dict['aggregateRating']['ratingCount'],
"bestRating": json_dict['aggregateRating']['bestRating'],
"worstRating": json_dict['aggregateRating']['worstRating'],
"servesCuisine": json_dict['servesCuisine'],
"priceRange": json_dict['priceRange']
}
return extracted_data

url_list_str = ['https://www.foodpanda.ph/restaurant/vh2d/sicilian-roast- legaspi-village',
'https://www.foodpanda.ph/restaurant/ns76/tokyo-milk-cheese-factory-greenbelt-5',
'https://www.foodpanda.ph/restaurant/hksd/paul-greenbelt-5']
all_data = []
for url in url_list_str:
headers = {
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36"
}
response = requests.get(url, headers=headers)
html = BeautifulSoup(response.content, 'html.parser')
data = foodpanda_data(html)
all_data.append(data)
time.sleep(1)
df = pd.DataFrame(all_data)
print(df.head())

输出:

name                                      streetAddress addressLocality postalCode   latitude   longitude                                                url  ratingValue  ratingCount  bestRating  worstRating                           servesCuisine priceRange
0         Sicilian Roast - Legaspi Village  100 Don Carlos Palanca corner Dela Rosa Street...     Makati City       1229  14.556083  121.019540  https://www.foodpanda.ph/restaurant/vh2d/sicil...          4.4           29           5            1                 [Italian, Pizza, Pasta]         ₱₱
1  Tokyo Milk Cheese Factory - Greenbelt 5  2nd Floor Greenbelt 5 Legazpi Street Legazpi V...     Makati City       1229  14.553329  121.022054  https://www.foodpanda.ph/restaurant/ns76/tokyo...          5.0           58           5            1    [Desserts, Fast Food, Snacks, Cakes]        ₱₱₱
2                       PAUL - Greenbelt 5  Ground Floor Greenbelt 5 Legazpi Street Barang...     Makati City       1223  14.552704  121.020531  https://www.foodpanda.ph/restaurant/hksd/paul-...          4.7           12           5            1  [Sandwiches, American, Western, Bread]         ₱₱

最新更新