BeautifulSoup with Python to get articleBody



我们的想法是提取articleBody中的内容,但代码对我不起作用。我缺少什么来带来注释?

from bs4 import BeautifulSoup
import requests
link = 'https://www.clarin.com/sociedad/coronavirus-estudio-dice-acciones-sencillas-podrian-efectivas-cuarentenas_0_ZQM2_GZZn.html'
response = requests.get(link)
soup = BeautifulSoup(response.content, "html.parser")
label = soup.find("application/ld+json", text="articleBody:")
label

您可以使用type="application/ld+json"搜索数据。

您查看的数据是JSON格式的,您可以使用json模块将其转换为python字典:

import json
import requests
from bs4 import BeautifulSoup
link = 'https://www.clarin.com/sociedad/coronavirus-estudio-dice-acciones-sencillas-podrian-efectivas-cuarentenas_0_ZQM2_GZZn.html'
soup = BeautifulSoup(requests.get(link).content, "html.parser")
json_data = json.loads(soup.find(type="application/ld+json").string)
print(type(json_data))
print(json_data['description'])

输出:

<class 'dict'>
Un equipo de investigadores de la Universidad de Viena, .....

或者您可以使用CSS Selector来搜索类body-nota:下的所有<p>标记

soup = BeautifulSoup(requests.get(link).content, "html.parser")
for tag in soup.select(".body-nota > p"):
print(tag.text)

最新更新