- Python 版本: 3.8
- BS4 库
我有以下 HTML,它代表了我抓取的大约 20+ 条评论中的 20 条。由于空间原因,我没有在这里包括其余部分,但您可以想象这些块不断重复。
我需要从每条评论中检索"sml-rank-stars sml-str40 star"(如此处第二行所示)。
<div class="review-rank">
<span class="sml-rank-stars sml-str40 star"></span>
<span class="score">
<span class="item">
口味:3.5
</span>
<span class="item">
环境:4.0
</span>
<span class="item">
服务:3.5
</span>
<span class="item">人均:200元</span>
</span>
</div>
<div class="review-rank">
<span class="sml-rank-stars sml-str35 star"></span>
<span class="score">
<span class="item">
口味:3.0
</span>
<span class="item">
环境:4.5
</span>
<span class="item">
服务:3.0
</span>
</span>
</div>
这是我到目前为止尝试过的:
for review in review_items.find_all('div', class_='main-review'):
review_rank = review.find('div', class_='review-rank')
star_rank = []
for review in review_rank.find_all('span')[:1]:
star_rank.append(review.get('class'))
print(star_rank)
我得到结果输出:
[['sml-rank-stars', 'sml-str5', 'star']]
然后,我可以使用此代码仅获取数字:
star_rank[0][1][7:]
输出:
'5'
这样做的问题是我只收到其中一条评论,我需要为存储在列表中的每条评论提供此行。
我想要的输出是这样的,或者我可以迭代以获得每个评论的星数:
[['sml-rank-stars', 'sml-str40', 'star'],
['sml-rank-stars', 'sml-str35', 'star'],
['sml-rank-stars', 'sml-str50', 'star'],
['sml-rank-stars', 'sml-str40', 'star'],
['sml-rank-stars', 'sml-str40', 'star'],
['sml-rank-stars', 'sml-str50', 'star'],
['sml-rank-stars', 'sml-str50', 'star'],
['sml-rank-stars', 'sml-str45', 'star'],
['sml-rank-stars', 'sml-str10', 'star'],
['sml-rank-stars', 'sml-str35', 'star'],
['sml-rank-stars', 'sml-str45', 'star'],
['sml-rank-stars', 'sml-str40', 'star'],
['sml-rank-stars', 'sml-str45', 'star'],
['sml-rank-stars', 'sml-str10', 'star'],
['sml-rank-stars', 'sml-str5', 'star']]
我已经想出了如何使用以下代码打印出这样的结果,但我需要将其保存到列表中或我可以迭代的其他内容中。
for review in review_items.find_all('div', class_='main-review'):
review_rank = review.find('div', class_='review-rank')
for review in review_rank.find_all('span')[:1]:
print(review.get('class'))
输出:
['sml-rank-stars', 'sml-str40', 'star']
['sml-rank-stars', 'sml-str35', 'star']
['sml-rank-stars', 'sml-str50', 'star']
['sml-rank-stars', 'sml-str40', 'star']
['sml-rank-stars', 'sml-str40', 'star']
['sml-rank-stars', 'sml-str50', 'star']
['sml-rank-stars', 'sml-str50', 'star']
['sml-rank-stars', 'sml-str45', 'star']
['sml-rank-stars', 'sml-str10', 'star']
['sml-rank-stars', 'sml-str35', 'star']
['sml-rank-stars', 'sml-str45', 'star']
['sml-rank-stars', 'sml-str40', 'star']
['sml-rank-stars', 'sml-str45', 'star']
['sml-rank-stars', 'sml-str10', 'star']
['sml-rank-stars', 'sml-str5', 'star']
要遍历所有.review-rank
选择所有 - 要获得排名,请使用列表推导:
star_rank = []
for r in soup.select('.review-rank'):
star_rank.append([s.replace('sml-str','') for s in r.span['class'] if 'sml-str' in s][0])
或者像你的例子一样,不知道review_items
上面的通用结构是什么,如果只有一个或多个:
star_rank = []
for review in review_items.find_all('div', class_='main-review'):
for review in review.find_all('div', class_='review-rank'):
star_rank.append([s.replace('sml-str','') for s in review.span['class'] if 'sml-str' in s][0])
输出
['40', '35']