网页抓取文章获取属性错误：'str'对象没有属性'text'

我试图从atlasobscura.com中提取所有文章的标题，链接，日期。

一开始，我用request + parsel + xpath的形式编写了代码。(结果没有错误)

但是这次我用BeautifulSoup重写了代码，我得到了AttributeError: 'str' object has no attribute 'text'。

Traceback(最近一次调用):文件"E:PythonAtlasobscura.com_2.py"，第21行，在href = ' https://www.atlasobscura.com ' + str (s.find (div); (a)("href")。text)AttributeError: 'str'对象没有属性'text'

如果有人能解决这个问题。

请帮帮我。

谢谢!

代码如下:

import requests
from bs4 import BeautifulSoup
import pandas as pd
title = []
link = []
date = []
for x in range(1, 662):
print(f'=====> Scraping from page {x}')
url = f'https://www.atlasobscura.com/articles?page={x}'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'}
r = requests.get(url, headers=headers)
r.encoding = r.apparent_encoding
soup = BeautifulSoup(r.text, 'lxml')

articles = soup.find_all('div', class_='col-md-4 col-sm-6 col-xs-12')
for s in articles:
story = s.find('div', class_='content-card-text').find('h3').find('span').text
title.append(story)
href = 'https://www.atlasobscura.com' + str(s.find('div').find('a')['href'].text)
link.append(href)
m_d_y = s.find('div', class_='detail-sm article-card-detail article-card-date').text.strip()
date.append(m_d_y)
print(story, href, m_d_y)
atlasobscura = pd.DataFrame({
'Title': title,
'Link': link,
'Date': date
})
atlasobscura.to_excel('Atlasobscura.com.xlsx', index=False)

根据Traceback，问题在这里

str(s.find('div').find('a')['href'].text)

s是web元素。在其上应用find()得到另一个web元素所以在应用

之后

s.find('div').find('a')

我们有一些web元素。现在我们对其应用['href']，这将给出该web元素的href属性值。所以

s.find('div').find('a')['href']

返回某个字符串。
不能对字符串应用.text方法。
错误提示如下:
所以，你应该简单地删除.text，保留它为:

href = 'https://www.atlasobscura.com' + str(s.find('div').find('a')['href'])

.text方法可以应用在web元素对象上以获取它的文本值。

相关内容

最新更新

热门标签：