如何使用类"星形填充"从最后一个跨度获取文本?



我正在尝试使用BeautifulSoup抓取网站。我很难从评论中获得评分。它们存储在具有span标记的表中,最后一个类为'star fill'

seatcomfort = Ratings.select_one('tr:has(td:first-child:-soup-contains("Seat Comfort")) td.review-rating-stars.stars, span.star fill')
Value For Money = Ratings.select_one('tr:has(td:first-child:-soup-contains("Seat Comfort")) td.review-rating-stars.stars, span.star fill')
Inflight Entertainment = Ratings.select_one('tr:has(td:first-child:-soup-contains("Seat Comfort")) td.review-rating-stars.stars, span.star fill')
print (seatcomfort)
<td class="review-rating-stars stars"><span class="star fill">1</span><span class="star">2</span><span class="star">3</span><span class="star">4</span><span class="star">5</span></td>
<td class="review-rating-stars stars"><span class="star fill">1</span><span class="star">2</span><span class="star">3</span><span class="star">4</span><span class="star">5</span></td>
print (Value For Money)
<td class="review-rating-stars stars"><span class="star fill">1</span><span class="star fill">2</span><span class="star">3</span><span class="star">4</span><span class="star">5</span></td>
<td class="review-rating-stars stars"><span class="star fill">1</span><span class="star">2</span><span class="star">3</span><span class="star">4</span><span class="star">5</span></td>
print (Inflight Entertainment)
<td class="review-rating-stars stars"><span class="star fill">1</span><span class="star fill">2</span><span class="star fill">3</span><span class="star">4</span><span class="star">5</span></td>
<td class="review-rating-stars stars"><span class="star fill">1</span><span class="star">2</span><span class="star">3</span><span class="star">4</span><span class="star">5</span></td>

我希望Value for money得到1,value for money得到2,inflight entertainment得到3

问题需要一些改进(格式,初始HTML或url),所以这应该只指向方向。

选择类别为star fill的元素,并获得ResultSetlen()

len(soup.select('.review-rating-stars span.star.fill'))

或提取最后一个元素的文本:

soup.select('.review-rating-stars span.star.fill')[-1].text

使用dict:

存储结构化数据
{e.td.text:len(e.select('.star.fill')) for e in soup.select('table.review-ratings tr')}

from bs4 import BeautifulSoup
html = '''
<table class="review-ratings">
<tbody><tr>
<td class="review-rating-header food-beverages">Food &amp; Beverages</td>
<td class="review-rating-stars stars">
<span class="star fill">1</span><span class="star fill">2</span><span class="star fill">3</span><span class="star">4</span><span class="star">5</span>                                              </td>
</tr>
<tr>
<td class="review-rating-header inflight-entertainment">Inflight Entertainment</td>
<td class="review-rating-stars stars">
<span class="star fill">1</span><span class="star fill">2</span><span class="star fill">3</span><span class="star">4</span><span class="star">5</span>                                              </td>
</tr>
<tr>
<td class="review-rating-header seat-comfort">Seat Comfort</td>
<td class="review-rating-stars stars">
<span class="star fill">1</span><span class="star fill">2</span><span class="star fill">3</span><span class="star">4</span><span class="star">5</span>                                              </td>
</tr>
<tr>
<td class="review-rating-header staff-service">Staff Service</td>
<td class="review-rating-stars stars">
<span class="star fill">1</span><span class="star fill">2</span><span class="star fill">3</span><span class="star">4</span><span class="star">5</span>                                              </td>
</tr>
<tr>
<td class="review-rating-header value-for-money">Value for Money</td>
<td class="review-rating-stars stars">
<span class="star fill">1</span><span class="star fill">2</span><span class="star fill">3</span><span class="star">4</span><span class="star">5</span>                                              </td>
</tr></tbody></table>
'''
soup = BeautifulSoup(html)
{e.td.text:len(e.select('.star.fill')) for e in soup.select('table.review-ratings tr')}

{'Food & Beverages': 3,
'Inflight Entertainment': 3,
'Seat Comfort': 3,
'Staff Service': 3,
'Value for Money': 3}

最新更新