如何使用世界银行API通过Python下载许多PDF文件



我正在尝试使用Python从世界银行档案网站下载许多pdf文件(几百个(。API 网络链接可以使用任何选择的术语(例如,国家或部门 - 教育、卫生等(进行定制。

我已经尝试使用以下代码使用提到的 url 下载特定于越南教育部门的文件。该 URL 包含操作文档以及具有指定术语的所有 pdf 链接。但是,无法下载这些文件。

import requests
from urllib.parse import urljoin
from bs4 import BeautifulSoup
url = "http://search.worldbank.org/api/v2/wds?format=json&countcode=VN&majdocty_exact=Publications&teratopic_exact=Education&srt=docdt&order=desc"
#Folder to download the files
folder_location = r'J:New Volume (B)pdfs'
response = requests.get(url)
soup= BeautifulSoup(response.text, "html.parser")     
for link in soup.select("a[href$='.pdf']"):
    #Name the pdf files 
    filename = os.path.join(folder_location,link['href'].split('/')[-1])
    with open(filename, 'wb') as f:
        f.write(requests.get(urljoin(url,link['href'])).content)

运行代码后我没有发现任何错误,但我也无法下载任何文件。任何帮助将不胜感激。谢谢。

use response.json((; 不需要 bs4

import requests
from urllib.parse import urljoin
from bs4 import BeautifulSoup
url = "http://search.worldbank.org/api/v2/wds?format=json&countcode=VN&majdocty_exact=Publications&teratopic_exact=Education&srt=docdt&order=desc"
#Folder to download the files
folder_location = r'J:New Volume (B)pdfs'
response = requests.get(url).json()
for i in response['documents']:
    url=(response['documents'][i].get('pdfurl'))
    if url:
        filename = os.path.join(folder_location,url.split('/')[-1])
        with open(filename, 'wb') as f:
            f.write(requests.get(url).content)

最新更新