我正在尝试使用BeautifulSoup
抓取网站。我很难从评论中获得评分。它们存储在具有span
标记的表中,最后一个类为'star fill'
。
seatcomfort = Ratings.select_one('tr:has(td:first-child:-soup-contains("Seat Comfort")) td.review-rating-stars.stars, span.star fill')
Value For Money = Ratings.select_one('tr:has(td:first-child:-soup-contains("Seat Comfort")) td.review-rating-stars.stars, span.star fill')
Inflight Entertainment = Ratings.select_one('tr:has(td:first-child:-soup-contains("Seat Comfort")) td.review-rating-stars.stars, span.star fill')
print (seatcomfort)
<td class="review-rating-stars stars"><span class="star fill">1</span><span class="star">2</span><span class="star">3</span><span class="star">4</span><span class="star">5</span></td>
<td class="review-rating-stars stars"><span class="star fill">1</span><span class="star">2</span><span class="star">3</span><span class="star">4</span><span class="star">5</span></td>
print (Value For Money)
<td class="review-rating-stars stars"><span class="star fill">1</span><span class="star fill">2</span><span class="star">3</span><span class="star">4</span><span class="star">5</span></td>
<td class="review-rating-stars stars"><span class="star fill">1</span><span class="star">2</span><span class="star">3</span><span class="star">4</span><span class="star">5</span></td>
print (Inflight Entertainment)
<td class="review-rating-stars stars"><span class="star fill">1</span><span class="star fill">2</span><span class="star fill">3</span><span class="star">4</span><span class="star">5</span></td>
<td class="review-rating-stars stars"><span class="star fill">1</span><span class="star">2</span><span class="star">3</span><span class="star">4</span><span class="star">5</span></td>
我希望Value for money
得到1,value for money
得到2,inflight entertainment
得到3
问题需要一些改进(格式,初始HTML或url),所以这应该只指向方向。
选择类别为star fill
的元素,并获得ResultSet
的len()
len(soup.select('.review-rating-stars span.star.fill'))
或提取最后一个元素的文本:
soup.select('.review-rating-stars span.star.fill')[-1].text
使用dict
:
{e.td.text:len(e.select('.star.fill')) for e in soup.select('table.review-ratings tr')}
from bs4 import BeautifulSoup
html = '''
<table class="review-ratings">
<tbody><tr>
<td class="review-rating-header food-beverages">Food & Beverages</td>
<td class="review-rating-stars stars">
<span class="star fill">1</span><span class="star fill">2</span><span class="star fill">3</span><span class="star">4</span><span class="star">5</span> </td>
</tr>
<tr>
<td class="review-rating-header inflight-entertainment">Inflight Entertainment</td>
<td class="review-rating-stars stars">
<span class="star fill">1</span><span class="star fill">2</span><span class="star fill">3</span><span class="star">4</span><span class="star">5</span> </td>
</tr>
<tr>
<td class="review-rating-header seat-comfort">Seat Comfort</td>
<td class="review-rating-stars stars">
<span class="star fill">1</span><span class="star fill">2</span><span class="star fill">3</span><span class="star">4</span><span class="star">5</span> </td>
</tr>
<tr>
<td class="review-rating-header staff-service">Staff Service</td>
<td class="review-rating-stars stars">
<span class="star fill">1</span><span class="star fill">2</span><span class="star fill">3</span><span class="star">4</span><span class="star">5</span> </td>
</tr>
<tr>
<td class="review-rating-header value-for-money">Value for Money</td>
<td class="review-rating-stars stars">
<span class="star fill">1</span><span class="star fill">2</span><span class="star fill">3</span><span class="star">4</span><span class="star">5</span> </td>
</tr></tbody></table>
'''
soup = BeautifulSoup(html)
{e.td.text:len(e.select('.star.fill')) for e in soup.select('table.review-ratings tr')}
{'Food & Beverages': 3,
'Inflight Entertainment': 3,
'Seat Comfort': 3,
'Staff Service': 3,
'Value for Money': 3}
from bs4 import BeautifulSoup
html = '''
<table class="review-ratings">
<tbody><tr>
<td class="review-rating-header food-beverages">Food & Beverages</td>
<td class="review-rating-stars stars">
<span class="star fill">1</span><span class="star fill">2</span><span class="star fill">3</span><span class="star">4</span><span class="star">5</span> </td>
</tr>
<tr>
<td class="review-rating-header inflight-entertainment">Inflight Entertainment</td>
<td class="review-rating-stars stars">
<span class="star fill">1</span><span class="star fill">2</span><span class="star fill">3</span><span class="star">4</span><span class="star">5</span> </td>
</tr>
<tr>
<td class="review-rating-header seat-comfort">Seat Comfort</td>
<td class="review-rating-stars stars">
<span class="star fill">1</span><span class="star fill">2</span><span class="star fill">3</span><span class="star">4</span><span class="star">5</span> </td>
</tr>
<tr>
<td class="review-rating-header staff-service">Staff Service</td>
<td class="review-rating-stars stars">
<span class="star fill">1</span><span class="star fill">2</span><span class="star fill">3</span><span class="star">4</span><span class="star">5</span> </td>
</tr>
<tr>
<td class="review-rating-header value-for-money">Value for Money</td>
<td class="review-rating-stars stars">
<span class="star fill">1</span><span class="star fill">2</span><span class="star fill">3</span><span class="star">4</span><span class="star">5</span> </td>
</tr></tbody></table>
'''
soup = BeautifulSoup(html)
{e.td.text:len(e.select('.star.fill')) for e in soup.select('table.review-ratings tr')}
{'Food & Beverages': 3,
'Inflight Entertainment': 3,
'Seat Comfort': 3,
'Staff Service': 3,
'Value for Money': 3}