网站总是使用python请求库挂起



我正在尝试使用python请求库从这个url获取htmlhttps://www.adidas.com/api/products/EF2302/availability?sitePath=us

然而,每次我运行代码时,在发出获取请求时,它都会挂起

header = BASE_REQUEST_HEADER
url = 'https://www.adidas.com/api/products/EF2302/availability?sitePath=us'
r = requests.get(url, headers = header)

我检查了chrome中的网络选项卡,并复制了所有使用的标头,包括用户代理,所以这不是问题所在。我还可以在禁用javascript和cookie的情况下用chrome加载页面。

此代码适用于其他网站。我根本无法从阿迪达斯的任何网站(包括https://www.adidas.com/us)。

如有任何建议,我们将不胜感激。

此网站不喜欢请求提供的默认用户代理字段,将其更改为Firefox/Chrome(我在示例中选择了Firefox(,您可以成功读取数据:

from bs4 import BeautifulSoup
import requests
import json
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0'}
url = 'https://www.adidas.com/api/products/EF2302/availability?sitePath=us'
r = requests.get(url, headers=headers)
json_data = json.loads(r.text)
print(json.dumps(json_data, indent=4))

打印:

{
"id": "EF2302",
"availability_status": "PREORDER",
"variation_list": [
{
"sku": "EF2302_530",
"availability": 15,
"availability_status": "PREORDER",
"size": "4",
"instock_date": "2018-08-16T00:00:00.000Z"
},
{
"sku": "EF2302_550",
"availability": 15,
"availability_status": "PREORDER",
"size": "5",
"instock_date": "2018-08-16T00:00:00.000Z"
},
{
"sku": "EF2302_570",
"availability": 15,
"availability_status": "PREORDER",
"size": "6",
"instock_date": "2018-08-16T00:00:00.000Z"
},
{
"sku": "EF2302_590",
"availability": 15,
"availability_status": "PREORDER",
"size": "7",
"instock_date": "2018-08-16T00:00:00.000Z"
},
{
"sku": "EF2302_610",
"availability": 15,
"availability_status": "PREORDER",
"size": "8",
"instock_date": "2018-08-16T00:00:00.000Z"
},
{
"sku": "EF2302_630",
"availability": 15,
"availability_status": "PREORDER",
"size": "9",
"instock_date": "2018-08-16T00:00:00.000Z"
},
{
"sku": "EF2302_650",
"availability": 15,
"availability_status": "PREORDER",
"size": "10",
"instock_date": "2018-08-16T00:00:00.000Z"
},
{
"sku": "EF2302_670",
"availability": 15,
"availability_status": "PREORDER",
"size": "11",
"instock_date": "2018-08-16T00:00:00.000Z"
},
{
"sku": "EF2302_690",
"availability": 15,
"availability_status": "PREORDER",
"size": "12",
"instock_date": "2018-08-16T00:00:00.000Z"
},
{
"sku": "EF2302_710",
"availability": 15,
"availability_status": "PREORDER",
"size": "13",
"instock_date": "2018-08-16T00:00:00.000Z"
}
]
}

一个不同的是User agent字段,它请求设置为用户代理:python请求/2.18.4

阿迪达斯可能只是放弃了这些http请求,以阻止人们滥用他们的系统。

(顺便说一句,这也适用于www.adidas.com(

我复制了这个问题,并查看了wireshark数据包嗅探器。看起来http请求很好,有tcp确认,但没有http回复。

最新更新