Python request.get用一个单词响应,我尝试了不同的标题



大家好,我正试图抓取这个网站,但由于某种原因,它没有被抓取。如果有人能帮我解决这个问题,我真的很感激。我曾尝试使用不同的用户代理,但由于某种原因,它不起作用。对于页面内容,它打印b'',对于汤,它的空

提前感谢这是我的代码:

import requests
from bs4 import BeautifulSoup
 
url = "https://www.carrefourjordan.com/mafjor/en/c/deals?currentPage=1&filter=&nextPageOffset=0&pageSize=60&sortBy=relevance"
headers = {'User-Agent':'test'}
page = requests.get(url,headers=headers)
print(page.content)
soup = BeautifulSoup(page.content, "html.parser")
print(soup)
** These the 3 different headers I used **

```headers = {'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.143 Safari/537.36'}
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36', "Upgrade-Insecure-Requests": "1","DNT": "1","Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8","Accept-Language": "en-US,en;q=0.5","Accept-Encoding": "gzip, deflate"}```

您需要首先获得正确的cookie。所以你需要使用会话

import requests
from bs4 import BeautifulSoup
headers={'User-Agent': 'Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.2; Trident/5.1)'}
 
url = "https://www.carrefourjordan.com/mafjor/en/c/deals?currentPage=1&filter=&nextPageOffset=0&pageSize=60&sortBy=relevance"
with requests.session() as s:
    s.headers.update(headers)
    # get the cookies first
    s.get("https://www.carrefourjordan.com")
    
    page = s.get(url)
    soup = BeautifulSoup(page.text, "html.parser")
    print(soup)

最新更新