我有两个span里面有文字。他们没有类或id,我想刮的文本与bs4,但我不知道如何。使用小标签对我没有帮助,因为html中充满了小标签。有人能给我举个例子吗?
输入图片描述
<td valign="bottom" class="bottom-cell">
<div class="space rel">
<p class="lheight16">
<small class="breadcrumb x-normal">
<span><i data-icon="location-filled"></i>Iasi</span>
</small>
<small class="breadcrumb x-normal">
<span><i data-icon="clock"></i>Ieri 16:13</span>
</small>
</p>
试试这个,:nth-of-type(1)
选择器匹配它的父元素的特定类型的第1个子元素
for i in data.select('.lheight16 small span:nth-of-type(1)'):
print(i.text)
有多个选项可以做到这一点,但大多数将针对span
s的parents
-由于在您的问题中没有预期的输出(建议您应该改进),请检查这两个。
选项:
for span in soup.select('td.bottom-cell span'):
print(span.get_text())
选项:b
print(soup.select_one('td.bottom-cell').get_text(' - ',strip=True))
from bs4 import BeautifulSoup
html='''
<td valign="bottom" class="bottom-cell">
<div class="space rel">
<p class="lheight16">
<small class="breadcrumb x-normal">
<span><i data-icon="location-filled"></i>Iasi</span>
</small>
<small class="breadcrumb x-normal">
<span><i data-icon="clock"></i>Ieri 16:13</span>
</small>
</p>
</div>
</td>
'''
soup = BeautifulSoup(html, 'lxml')
#option a:
for span in soup.select('td.bottom-cell span'):
print(span.get_text())
#option:b
print(soup.select_one('td.bottom-cell').get_text(' - ',strip=True))
答:
Iasi
Ieri 16:13
b:
Iasi - Ieri 16:13