如何从课堂选择的元素中提取文本

我正在尝试使用BeautifulSoup从网站中提取数据。

网站数据显示：

<div content-43 class="item-name">This is the text I want to grab</div>

我目前正在使用：

item_store = soup.find_all("div",{"class":"item-name"})

然而，它会像div标记一样返回整行HTML，而不仅仅是我想要的文本。

您必须使用.get_text()来提取文本，而不是元素-请注意，如果您必须在调用方法之前迭代find_all()的ResultSet。

在单个元素上使用find()：

soup.find("div",{"class":"item-name"}).get_text()

在ResultSet:上使用find_all()

[e.get_text() for e in soup.find_all("div",{"class":"item-name"})]

同时在ResultSet:上使用select()和css selectors

[e.get_text() for e in soup.select('div.item-name')]

示例

from bs4 import BeautifulSoup
html = '''
<div content-43 class="item-name">This is the text I grab with find() and also with find_all()</div>
<div content-43 class="item-name">This is the text I want to grab with find_all() </div>
'''
soup = BeautifulSoup(html)
print(soup.find("div",{"class":"item-name"}).get_text())
print([e.get_text() for e in soup.find_all("div",{"class":"item-name"})])

输出

This is the text I grab with find() and also with find_all()

和

['This is the text I grab with find() and also with find_all()',
'This is the text I want to grab with find_all() ']

您应该使用.get_text()方法或text属性
您可以像这样打印它们

for item in item_store:
print(item.text)
# print(item.get_text())

示例

输出

相关内容

最新更新

热门标签：