我正在尝试使用Python从世界银行档案网站下载许多pdf文件(几百个(。API 网络链接可以使用任何选择的术语(例如,国家或部门 - 教育、卫生等(进行定制。
我已经尝试使用以下代码使用提到的 url 下载特定于越南教育部门的文件。该 URL 包含操作文档以及具有指定术语的所有 pdf 链接。但是,无法下载这些文件。
import requests
from urllib.parse import urljoin
from bs4 import BeautifulSoup
url = "http://search.worldbank.org/api/v2/wds?format=json&countcode=VN&majdocty_exact=Publications&teratopic_exact=Education&srt=docdt&order=desc"
#Folder to download the files
folder_location = r'J:New Volume (B)pdfs'
response = requests.get(url)
soup= BeautifulSoup(response.text, "html.parser")
for link in soup.select("a[href$='.pdf']"):
#Name the pdf files
filename = os.path.join(folder_location,link['href'].split('/')[-1])
with open(filename, 'wb') as f:
f.write(requests.get(urljoin(url,link['href'])).content)
运行代码后我没有发现任何错误,但我也无法下载任何文件。任何帮助将不胜感激。谢谢。
use response.json((; 不需要 bs4
import requests
from urllib.parse import urljoin
from bs4 import BeautifulSoup
url = "http://search.worldbank.org/api/v2/wds?format=json&countcode=VN&majdocty_exact=Publications&teratopic_exact=Education&srt=docdt&order=desc"
#Folder to download the files
folder_location = r'J:New Volume (B)pdfs'
response = requests.get(url).json()
for i in response['documents']:
url=(response['documents'][i].get('pdfurl'))
if url:
filename = os.path.join(folder_location,url.split('/')[-1])
with open(filename, 'wb') as f:
f.write(requests.get(url).content)