我正在尝试抓取一个网页。执行此代码时,输出running1
,而不输出running2
。为什么会这样呢?
代码:
from time import gmtime, strftime
import requests
from bs4 import BeautifulSoup
import smtplib
from email.mime.text import MIMEText
print("running1")
url = "https://www.johnlewis.com/nordictrack-commercial-14-9-elliptical-cross-trainer/p5639979"
response = requests.get(url)
print("running2")
soup = BeautifulSoup(response.text, 'lxml')
print("running3")
要从服务器获得正确的响应,请尝试指定User-Agent
HTTP头:
import requests
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:101.0) Gecko/20100101 Firefox/101.0"
}
url = "https://www.johnlewis.com/nordictrack-commercial-14-9-elliptical-cross-trainer/p5639979"
response = requests.get(url, headers=headers)
print(response.text)
打印:
<!DOCTYPE html><html lang="en"><head>
...