我想从访问一些数据
https://www.calorie-charts.info/food/all/banana
我尝试了Python请求会话,但需要2分钟才能得到响应。
import requests
s = requests.Session()
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36'
}
url = 'https://www.calorie-charts.info/food/all/banana"'
s.headers.update(headers)
r = s.get(url)
print(r.text)
我还试图为XHR请求找到一个api链接,但在开发工具-网络选项卡中找不到
如何加快处理速度或找到XHR请求的链接?
这是获取数据的一种方法(更新:全部34页(:
import requests
from bs4 import BeautifulSoup as bs
import pandas as pd
from tqdm import tqdm
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36'
}
s = requests.Session()
s.headers.update(headers)
big_df = pd.DataFrame()
for x in tqdm(range(1,35)):
url = f'https://www.calorie-charts.info/food/all/banana/page/{x}'
r = s.get(url)
df = pd.read_html(r.text)[0]
big_df = pd.concat([big_df, df], axis=0, ignore_index=True)
print(big_df)
结果:
Name Calories(kcal) Amount Energy (kj) Proteins (g) Carbohydrates (g) Fat (g) Fiber (g) Unnamed: 8
0 banana 84.5 medium banana (90 g) 354 1 20.0 22.0 2.0 Detail
1 banana bio 105.0 126 g 440 1 27.0 38.0 3.0 Detail
2 banana bar 147.6 40 g 618 6 28.0 4.0 NaN Detail
3 banana curd 224.7 100 g 941 18 30.0 3.0 1.0 Detail
4 ripe banana 199.6 200 g 836 2 40.0 1.0 4.0 Detail
... ... ... ... ... ... ... ... ... ...
1675 Oatmeal lumps of cheese 3 bananas, 1 cheese, 2... 20.5 piece (10 g) 86 78.0 3.0 73.0 36.0 Detail
1676 pancakes (tangerine, apple, banana, 1/2 curd, ... 54.9 1 pancake (71 g) 230 5.0 6.0 1.0 99.0 Detail
1677 Fit cake with nuts, avocados and bananas, almo... 264.4 100 g 1 107 7.0 30.0 13.0 6.0 Detail
1678 100% Whey Protein vanilla, banana, strawberry,... 113.4 portion (30 g) 475 23.0 2.0 1.0 42.0 Detail
1679 Excellent 24% Protein Bar pineapple with cocon... 372.8 85 g 1 561 20.0 36.0 16.0 2.0 Detail
请求文档:https://requests.readthedocs.io/en/latest/
此外,熊猫相关文件:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_html.html