Beautiful Soup:提取两个不同的标签,并将它们附加在一起,形成纯文本输出



我一直在玩漂亮的汤,试图学习它。到目前为止,我学到了一些东西,但我很难把我的用例放在一起。如何打印,电影列表和电影核心文本都只附加在一起?感谢您的帮助和信息。真的很喜欢python和它的一些应用程序,比如web抓取。

import requests
from bs4 import BeautifulSoup
result = requests.get("https://www.rottentomatoes.com/browse/opening")
print("Checking Website")
print(result.status_code)
print("Gathering Website data and preparing it for presentation")
src = result.content
soup = BeautifulSoup(src, 'lxml')
movielist = soup.find_all("div",attrs={"class":"media-list__title"})
moviescore = soup.find_all("span",attrs={"class":"tMeterScore"})
for movielist in soup.find_all("div",attrs={"class":"media-list__title"}):
print (movielist.text)

这里的关键是"zip";你有两份清单。但在此之前,您需要从每个元素中获取文本值并将其剥离

以下是对您的代码的轻微修改:

import requests
from bs4 import BeautifulSoup

result = requests.get("https://www.rottentomatoes.com/browse/opening")
print("Checking Website")
print(result.status_code)
print("Gathering Website data and preparing it for presentation")
soup = BeautifulSoup(result.content, 'lxml')
# get each movie title and remove any whitespace characters
movies = [
title.getText(strip=True) for title in
soup.find_all("div", attrs={"class": "media-list__title"})
]
# get each movie score, remove any whitespace chars, and replace '- -'
# with a custom message -> No score yet. :(
movie_scores = [
score.getText(strip=True).replace("- -", "No score yet. :(") for score
in soup.select(".media-list__meter-container")  # introducing css selectors :)
]
for movie_data in zip(movies, movie_scores):  # zipping the two lists
title, score = movie_data  # this outputs a tuple: (MOVIE_TITLE, MOVIE_SCORE)
print(f"{title}: {score}")

输出:

Checking Website
200
Gathering Website data and preparing it for presentation
The Courier: 79%
The Heiress: No score yet. :(
The Stay: No score yet. :(
City of Lies: 50%
Happily: 70%
Doors: No score yet. :(
Last Call: No score yet. :(
Enforcement: 100%
Phobias: No score yet. :(
Dark State: No score yet. :(
Food Club: 83%
Wojnarowicz: 100%

您也可以使用带有if-else的字典理解来检查评级是否存在

import requests
from bs4 import BeautifulSoup
result = requests.get("https://www.rottentomatoes.com/browse/opening")
print("Checking Website")
print(result.status_code)
print("Gathering Website data and preparing it for presentation")
soup = BeautifulSoup(result.content, 'lxml')
movie_scores = {i.select_one('.media-list__title').text:i.select_one('.tMeterScore').text if i.select_one('.tMeterScore') else 'No score' for i in soup.select('.media-list__item')}
print(movie_scores)

最新更新