我正在尝试抓取带有分页的网页。
代码:
from requests_html import _URL, HTMLSession
from bs4 import BeautifulSoup
for page in range(1,6):
s = HTMLSession()
url = 'https://www.lazada.com.ph/catalog/?q=laptop'
url += '&page={}'
r=s.get(url.format(page))
print(url)
输出:
https://www.lazada.com.ph/catalog/?q=laptop&page={}
https://www.lazada.com.ph/catalog/?q=laptop&page={}
https://www.lazada.com.ph/catalog/?q=laptop&page={}
https://www.lazada.com.ph/catalog/?q=laptop&page={}
https://www.lazada.com.ph/catalog/?q=laptop&page={}
期望:
https://www.lazada.com.ph/catalog/?q=laptop&page={1}
https://www.lazada.com.ph/catalog/?q=laptop&page={2}
https://www.lazada.com.ph/catalog/?q=laptop&page={3}
https://www.lazada.com.ph/catalog/?q=laptop&page={4}
https://www.lazada.com.ph/catalog/?q=laptop&page={5}
我还是新手,学习python,请帮助我得到我期望的结果。提前谢谢。
如果你的python版本支持f字符串…
from requests_html import _URL, HTMLSession
from bs4 import BeautifulSoup
for page in range(1,6):
s = HTMLSession()
url = f'https://www.lazada.com.ph/catalog/?q=laptop&page={page}'
r = s.get(url)
print(url)
url没有被修改,尝试:
from requests_html import _URL, HTMLSession
from bs4 import BeautifulSoup
for page in range(1,6):
s = HTMLSession()
url = 'https://www.lazada.com.ph/catalog/?q=laptop'
url += '&page={}'
url = url.format(page)
r = s.get(url)
print(url)
或者更好:
from requests_html import _URL, HTMLSession
from bs4 import BeautifulSoup
for page in range(1,6):
s = HTMLSession()
url = 'https://www.lazada.com.ph/catalog/?q=laptop&page={}'.format(page)
r = s.get(url)
print(url)