使用请求进行Web抓取-在网站中选择过滤器

我使用以下代码从AMF网站获取前20个pdf(https://bdif.amf-france.org)。我试着更具体一点，只下载"；Déclaration des dirigeants"；但我不知道该怎么做。如何在url中集成此筛选器？类似于https://bdif.amf-france.org/back/api/v1/informations?from=0&大小=2？typesInformation=DD。有人能帮忙吗？

import requests
from shutil import copyfileobj
endpoint = "https://bdif.amf-france.org/back/api/v1/informations?from=0&size=20"
base_api_url = "https://bdif.amf-france.org/back/api/v1/documents"
headers = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:97.0) Gecko/20100101 Firefox/97.0",
}
with requests.Session() as s:
response = s.get(endpoint, headers=headers).json()
file_sources = [
[
f"{base_api_url}/{item['_source']['documents'][0]['path']}",  # Document
item["_source"]["documents"][0]["nomFichier"]  # File name
]
for item in response["hits"]["hits"]
]
for file in file_sources:
url, name = file
with s.get(url, stream=True) as pdf, open(name, "wb") as output:
copyfileobj(pdf.raw, output)

在url中将typesInformation参数设置为DD，如下所示：

endpoint = 'https://bdif.amf-france.org/back/api/v1/informations?from=0&size=20&typesInformation=DD'

相关内容

最新更新

热门标签：