使用BeautifulSoup和Python进行Web刮擦:无法提取文本



我试图抓取一个网站。但我没能提取出每一项的描述。这是我的代码:

from bs4 import BeautifulSoup
import requests
url = "http://engine.ddtc.co.id/putusan-pengadilan-pajak"
response = requests.get(url)
data = response.text
soup = BeautifulSoup(data, 'html.parser')
puts =soup.find_all("div",{"class":"p3-search-item"})
for put in puts:
title = put.find("div", {"class":"p3-title"}).text
cat = put.find("div", {"class":"p3-category"}).text
date = put.find("div", {"class":"search-result-item-meta"}).text
link = put.find("a").get("href")
put_response = requests.get(link)
put_data = put_response.text
put_soup = BeautifulSoup(put_data, "html.parser")
put_description = put_soup.find("div",{"id": "modal-contents-pp"}).text
print("Judul Putusan:", title, "nKategori:", cat, "nTanggal:", date, "nLink:", link, "nDescription:", put_description)

所以我没能提取描述。描述只显示空白和几个单词。如果我们点击每个项目的链接,就可以显示完整的描述。非常感谢您的帮助。

我认为您需要更改putdescription字段:

from bs4 import BeautifulSoup
import requests
url = "http://engine.ddtc.co.id/putusan-pengadilan-pajak"
response = requests.get(url)
data = response.text
soup = BeautifulSoup(data, 'html.parser')
puts =soup.find_all("div",{"class":"p3-search-item"})
for put in puts:
title = put.find("div", {"class":"p3-title"}).text
cat = put.find("div", {"class":"p3-category"}).text
date = put.find("div", {"class":"search-result-item-meta"}).text
link = put.find("a").get("href")
put_response = requests.get(link)
put_data = put_response.text
put_soup = BeautifulSoup(put_data, "html.parser")
put_description = put_soup.find("div",{"class": "p3-desc"}).text
print("Judul Putusan:", title, "nKategori:", cat, "nTanggal:", date, "nLink:", link, "nDescription:", put_description)

最新更新