如何接受带有请求的cookie



我试图使用python抓取网页,但为了抓取网页,我需要接受网页上的cookie。

我试过的代码是

URL = "https://www.howoge.de/wohnungen-gewerbe/wohnungssuche.html"
with open('cookies') as f:
j = json.load(f)
session = requests.Session()
for cookie in j: session.cookies.set(cookie['name'], cookie['value'])
r = session.get(URL)

尽管这没有引起任何错误,但仍然不接受cookie。

这是我的饼干:

[
{
"domain": ".howoge.de",
"expirationDate": 1694266885,
"hostOnly": false,
"httpOnly": false,
"name": "__cmpcpcu10543",
"path": "/",
"sameSite": "no_restriction",
"secure": true,
"session": false,
"storeId": null,
"value": "__51_54__"
},
{
"domain": ".howoge.de",
"expirationDate": 1694266885,
"hostOnly": false,
"httpOnly": false,
"name": "__cmpconsent10543",
"path": "/",
"sameSite": "no_restriction",
"secure": true,
"session": false,
"storeId": null,
"value": "BPfEJk4PfEJk4AfHIBDEDXAAAAAAAA"
},
{
"domain": "www.howoge.de",
"expirationDate": 1696858880,
"hostOnly": true,
"httpOnly": false,
"name": "__cmpcc",
"path": "/",
"sameSite": "no_restriction",
"secure": true,
"session": false,
"storeId": null,
"value": "1"
},
{
"domain": ".howoge.de",
"expirationDate": 1694266885,
"hostOnly": false,
"httpOnly": false,
"name": "__cmpcvcu10543",
"path": "/",
"sameSite": "no_restriction",
"secure": true,
"session": false,
"storeId": null,
"value": "__s974_U__"
},
{
"domain": "www.howoge.de",
"hostOnly": true,
"httpOnly": false,
"name": "PHPSESSID",
"path": "/",
"sameSite": null,
"secure": false,
"session": true,
"storeId": null,
"value": "8pnd5h5up4v4rjh498if7hedac"
}
]

解决这个问题的最佳方法应该是什么?

您不需要cookie或其他任何东西。

试试这个:

import requests
api_url = "https://www.howoge.de/?type=999&tx_howsite_json_list[action]=immoList"
request_payload = {
"tx_howsite_json_list[page]": "1",
"tx_howsite_json_list[limit]": "12",
"tx_howsite_json_list[lang]": "",
"tx_howsite_json_list[rent]": "",
"tx_howsite_json_list[area]": "",
"tx_howsite_json_list[rooms]": "egal",
"tx_howsite_json_list[wbs]": "all-offers",
}
response = requests.post(api_url, data=request_payload).json()
for item in response["immoobjects"]:
print(f'{item["title"]} - {item["rent"]}')

输出:

Rüdickenstraße 23, 13053 Berlin - 1174.31
Rüdickenstraße 23, 13053 Berlin - 1174.31
Rotkamp 4, 13053 Berlin - 1428.25
Rotkamp 6, 13053 Berlin - 617.41
Rotkamp 6, 13053 Berlin - 1147.71
Rotkamp 6, 13053 Berlin - 1147.71
Rotkamp 6, 13053 Berlin - 565.12
Frankfurter Allee 218, 10365 Berlin - 513.85
Frankfurter Allee 218, 10365 Berlin - 501.6
Frankfurter Allee 218, 10365 Berlin - 513.85
Frankfurter Allee 218, 10365 Berlin - 717
Frankfurter Allee 218, 10365 Berlin - 890.6

您不需要cookie,也不需要标头。尝试这样做可以获得一个干净的列表数据帧:

data = {
'tx_howsite_json_list[page]': '1',
'tx_howsite_json_list[limit]': '12',
'tx_howsite_json_list[lang]': '',
'tx_howsite_json_list[rent]': '',
'tx_howsite_json_list[area]': '',
'tx_howsite_json_list[rooms]': 'egal',
'tx_howsite_json_list[wbs]': 'all-offers',
}
response = requests.post('https://www.howoge.de/?type=999&tx_howsite_json_list[action]=immoList', data=data)
df = pd.DataFrame(json.loads(response.content)["immoobjects"])
df.head()
uid     title   image   district    rent    area    rooms   wbs     features    coordinates     icon    link    favorite    notice
0   19335   Rüdickenstraße 23, 13053 Berlin     /fileadmin/promos/downloadedImages/266355f2369...   Alt-Hohenschönhausen    1174.31     77  3   nein    [Balkon/Loggia, Fußbodenheizung, Zentralheizun...   {'lat': '52.5600754', 'lng': '13.5089916'}  icon-Figures-haus_full  /wohnungen-gewerbe/wohnungssuche/detail/1771-1...   False   Schöne 3-Zimmer-Wohnung
1   19336   Rüdickenstraße 23, 13053 Berlin     /fileadmin/promos/downloadedImages/266355f2369...   Alt-Hohenschönhausen    1174.31     77  3   nein    [Balkon/Loggia, Fußbodenheizung, Zentralheizun...   {'lat': '52.5600754', 'lng': '13.5089916'}  icon-Figures-haus_full  /wohnungen-gewerbe/wohnungssuche/detail/1771-1...   False

如果要从以下页面获取列表,请更改data'tx_howsite_json_list[page]': '1',的值。

最新更新