我一直在玩漂亮的汤,试图学习它。到目前为止,我学到了一些东西,但我很难把我的用例放在一起。如何打印,电影列表和电影核心文本都只附加在一起?感谢您的帮助和信息。真的很喜欢python和它的一些应用程序,比如web抓取。
import requests
from bs4 import BeautifulSoup
result = requests.get("https://www.rottentomatoes.com/browse/opening")
print("Checking Website")
print(result.status_code)
print("Gathering Website data and preparing it for presentation")
src = result.content
soup = BeautifulSoup(src, 'lxml')
movielist = soup.find_all("div",attrs={"class":"media-list__title"})
moviescore = soup.find_all("span",attrs={"class":"tMeterScore"})
for movielist in soup.find_all("div",attrs={"class":"media-list__title"}):
print (movielist.text)
这里的关键是"zip";你有两份清单。但在此之前,您需要从每个元素中获取文本值并将其剥离
以下是对您的代码的轻微修改:
import requests
from bs4 import BeautifulSoup
result = requests.get("https://www.rottentomatoes.com/browse/opening")
print("Checking Website")
print(result.status_code)
print("Gathering Website data and preparing it for presentation")
soup = BeautifulSoup(result.content, 'lxml')
# get each movie title and remove any whitespace characters
movies = [
title.getText(strip=True) for title in
soup.find_all("div", attrs={"class": "media-list__title"})
]
# get each movie score, remove any whitespace chars, and replace '- -'
# with a custom message -> No score yet. :(
movie_scores = [
score.getText(strip=True).replace("- -", "No score yet. :(") for score
in soup.select(".media-list__meter-container") # introducing css selectors :)
]
for movie_data in zip(movies, movie_scores): # zipping the two lists
title, score = movie_data # this outputs a tuple: (MOVIE_TITLE, MOVIE_SCORE)
print(f"{title}: {score}")
输出:
Checking Website
200
Gathering Website data and preparing it for presentation
The Courier: 79%
The Heiress: No score yet. :(
The Stay: No score yet. :(
City of Lies: 50%
Happily: 70%
Doors: No score yet. :(
Last Call: No score yet. :(
Enforcement: 100%
Phobias: No score yet. :(
Dark State: No score yet. :(
Food Club: 83%
Wojnarowicz: 100%
您也可以使用带有if-else的字典理解来检查评级是否存在
import requests
from bs4 import BeautifulSoup
result = requests.get("https://www.rottentomatoes.com/browse/opening")
print("Checking Website")
print(result.status_code)
print("Gathering Website data and preparing it for presentation")
soup = BeautifulSoup(result.content, 'lxml')
movie_scores = {i.select_one('.media-list__title').text:i.select_one('.tMeterScore').text if i.select_one('.tMeterScore') else 'No score' for i in soup.select('.media-list__item')}
print(movie_scores)