Python:BeautifulSoup使用FindAll获取所有文本数据



我正在尝试下载网站的所有标题https://ec.europa.eu/eurostat/news/news-releases但它们都有相同的类,所以当我只使用find进行筛选时,我只得到第一个。使用方法FindAll显然应该获得同一类的所有文本,然后我应该能够特别地按一个进行筛选,但我总是使用FindAll方法得到响应错误,当然我做错了。这是我迄今为止的代码:

site3 = 'https://ec.europa.eu/eurostat/news/news-releases'
harware3 = {'User-Agent': 'Mozilla/5.0'}
request3 = Request(site3,headers=harware3)
page3 = urlopen(request3)
soup3 = BeautifulSoup(page3, 'html.parser')
informes = soup3.findAll('div',{"class": "product-title"}).text
for 1 in informes:
print(1['href'])

您的代码中有几个错误,主要是ResultSet类型没有.text属性。

此外,1在Python中不是有效的变量名,请将其更改为例如i:

from bs4 import BeautifulSoup
from urllib.request import Request, urlopen

site3 = 'https://ec.europa.eu/eurostat/news/news-releases'
harware3 = {'User-Agent': 'Mozilla/5.0'}
request3 = Request(site3,headers=harware3)
page3 = urlopen(request3)
soup3 = BeautifulSoup(page3, 'html.parser')
informes = soup3.findAll('div',{"class": "product-title"})  # <-- remove `.text`
for i in informes:       # <-- change 1 to i
print(i.a['href'])   # <-- add `.a`

打印:

https://ec.europa.eu/eurostat/documents/2995521/11156763/2-31072020-AP-EN.pdf/c033a89c-da21-8888-d9a1-3bc1d0ce1a6f
https://ec.europa.eu/eurostat/documents/2995521/11156775/2-31072020-BP-EN.pdf/cbe7522c-ebfa-ef08-be60-b1c9d1bd385b
https://ec.europa.eu/eurostat/documents/2995521/11156668/3-30072020-AP-EN.pdf/1b69a5ae-35d2-0460-f76f-12ce7f6c34be
https://ec.europa.eu/eurostat/documents/2995521/11146677/2-28072020-AP-EN.pdf/41ab3dee-a9dd-6827-46c9-595050ea3d31
https://ec.europa.eu/eurostat/documents/2995521/11129607/2-22072020-AP-EN.pdf/ab6cd4ff-ec57-d984-e85a-41a351df1ffd
https://ec.europa.eu/eurostat/documents/2995521/11129672/2-22072020-BP-EN.pdf/5ccced57-ee23-ad08-75d4-0fa81dc3fe0e
https://ec.europa.eu/eurostat/documents/2995521/11107828/2-17072020-AP-EN.pdf/9b5bd6a9-3002-7a65-197a-e046f030c600
https://ec.europa.eu/eurostat/documents/2995521/11107901/4-17072020-BP-EN.pdf/b1e7c5ac-4058-f193-abcf-8fda0f29f28c
https://ec.europa.eu/eurostat/documents/2995521/10300267/6-16072020-AP-EN.pdf/84f468f1-b632-e761-ebc1-420e26ad5cf2
https://ec.europa.eu/eurostat/documents/2995521/11096023/4-14072020-AP-EN.pdf/dc899de9-ce8d-c114-ab06-2bfa1f4d5dd7
https://ec.europa.eu/eurostat/documents/2995521/11081093/3-10072020-AP-EN.pdf/d2f799bf-4412-05cc-a357-7b49b93615f1
https://ec.europa.eu/eurostat/documents/2995521/11070754/3-08072020-BP-EN.pdf/6797c084-1792-880f-0039-5bbbca736da1
https://ec.europa.eu/eurostat/documents/2995521/11074516/2-08072020-AP-EN.pdf/8f420fb9-6d3b-7a9e-68d7-e6a7b197c20b
https://ec.europa.eu/eurostat/documents/2995521/10300303/2-06072020-AP-EN.pdf/89b30efd-93b1-20be-a618-d7c19a15fa09
https://ec.europa.eu/eurostat/documents/2995521/11061414/4-06072020-BP-EN.pdf/422a2c72-8de7-2bb6-2cd6-1829711add76
https://ec.europa.eu/eurostat/documents/2995521/10300291/2-03072020-BP-EN.pdf/bb77725c-3c34-617e-283f-2b509cf17b4a
https://ec.europa.eu/eurostat/documents/2995521/10300279/2-03072020-AP-EN.pdf/2edaf9a9-b5e5-db10-f6a9-5b05615e79f0
https://ec.europa.eu/eurostat/documents/2995521/11054062/3-02072020-AP-EN.pdf/ce573d1a-04a5-6762-5b56-cb322cbdc5ac
https://ec.europa.eu/eurostat/documents/2995521/11061642/4-02072020-BP-EN.pdf/bb578182-5df9-c0cf-8b84-f4b852518914
https://ec.europa.eu/eurostat/documents/2995521/10294972/2-30062020-AP-EN.pdf/4d9c6e1d-b92c-431d-384a-2ab18f6eeaa6

编辑:获取链接文本:

from bs4 import BeautifulSoup
from urllib.request import Request, urlopen

site3 = 'https://ec.europa.eu/eurostat/news/news-releases'
harware3 = {'User-Agent': 'Mozilla/5.0'}
request3 = Request(site3,headers=harware3)
page3 = urlopen(request3)
soup3 = BeautifulSoup(page3, 'html.parser')
informes = soup3.findAll('div',{"class": "product-title"})  # <-- remove `.text`
for i in informes:       # <-- change 1 to i
print(i.a.text)      # <-- add `.a`

打印:

Euro area annual inflation up to 0.4%
GDP down by 12.1% in the euro area and by 11.9% in the EU
Euro area unemployment at 7.8%
Sharpest drop of household real consumption per capita in both euro area and EU
Government debt up to 86.3% of GDP in euro area
Seasonally adjusted government deficit rose sharply to 2.2% of GDP in the euro area
Annual inflation up to 0.3% in the euro area
Production in construction up by 27.9% in euro area and 21.2% in EU
Euro area international trade in goods surplus €9.4 bn
Industrial production up by 12.4% in euro area and 11.4% in EU
EU population in 2020: almost 448 million
Absences from work at record high
House prices up by 5.0% in the euro area
EU current account surplus €59.9 bn
Volume of retail trade up by 17.8% in euro area
Business profit share recorded sharpest drop to 37.9% while business investment is slightly down to 25.5% in the euro area
Household saving rate all time high at 16.9% in the euro area while household investment rate down to 8.7%
Euro area unemployment at 7.4%
Industrial producer prices down by 0.6% in euro area
Euro area annual inflation up to 0.3%

最新更新