无法以某种自定义方式排列和打印网页中的某些字段



我已经创建了一个脚本来解析该网页中的movie nameall castProduced byCasting By。我可以从该页面解析上述字段。然而,当考虑到这四个项目时,我不能以某种定制的方式排列和打印这些项目。到目前为止,当我只包含movie namecast时,我编写的脚本可以完全按照我想要的方式打印项目。我希望包括你在这张图片中看到的Produced byCasting By

到目前为止,我已经尝试过:

import requests
from bs4 import BeautifulSoup
link = 'https://www.imdb.com/title/tt0068646/fullcredits?ref_=tt_cl_sm#cast'
with requests.Session() as s:
s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36'
r = s.get(link)
soup = BeautifulSoup(r.text,"lxml")
movie_name = soup.select_one("h3[itemprop='name'] > a").get_text(strip=True)
for item in soup.select("h4#cast + table.cast_list tr:has(:not(.castlist_label))"):
performer = item.select_one("td:not(.primary_photo) > a[href^='/name/']").get_text(strip=True)
character = ' '.join(item.select_one("td.character").text.split())
print(movie_name,performer,character)

我得到的输出(movie namecast(:

The Godfather Marlon Brando Don Vito Corleone
The Godfather Al Pacino Michael Corleone
The Godfather James Caan Sonny Corleone
The Godfather Richard S. Castellano Clemenza (as Richard Castellano)
The Godfather Robert Duvall Tom Hagen
The Godfather Sterling Hayden Capt. McCluskey
The Godfather John Marley Jack Woltz
and so on----------------------

我想在上面打印的底部添加以下结果(取自您在图像中看到的两个字段Produced byCasting By(:

The Godfather Gray Frederickson associate producer
The Godfather Al Ruddy producer (as Albert S. Ruddy) (produced by)
The Godfather Robert Evans studio executive (uncredited)
The Godfather Louis DiGiaimo (casting)
The Godfather Andrea Eastman (casting)
The Godfather Fred Roos (casting)

如何让脚本按照上面显示的方式打印字段

import requests
from bs4 import BeautifulSoup
link = 'https://www.imdb.com/title/tt0068646/fullcredits?ref_=tt_cl_sm#cast'
with requests.Session() as s:
s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36'
r = s.get(link)
soup = BeautifulSoup(r.text,"lxml")
movie_name = soup.select_one("h3[itemprop='name'] > a").get_text(strip=True)
for item in soup.select("h4#cast + table.cast_list tr:has(:not(.castlist_label))"):
performer = item.select_one("td:not(.primary_photo) > a[href^='/name/']").get_text(strip=True)
character = ' '.join(item.select_one("td.character").text.split())
print(movie_name,performer,character)
for row in soup.select('h4:contains("Produced by") + table tr'):
name = row.select_one('.name').get_text(strip=True)
credit = row.select_one('.credit').get_text(strip=True)
print(movie_name, name, credit)
for row in soup.select('h4:contains("Casting By") + table tr'):
name = row.select_one('.name').get_text(strip=True)
credit = row.select_one('.credit').get_text(strip=True)
print(movie_name, name, credit)

打印:

...
Krstný Otec Matthew Vlahakis Clemenza's Son (uncredited)
Krstný Otec Conrad Yama Fruit Vendor (uncredited)
Krstný Otec Gray Frederickson associate producer
Krstný Otec Al Ruddy producer (as Albert S. Ruddy) (produced by)
Krstný Otec Robert Evans studio executive (uncredited)
Krstný Otec Louis DiGiaimo (casting)
Krstný Otec Andrea Eastman (casting)
Krstný Otec Fred Roos (casting)

注:Krstný Otec在斯洛伐克语中的意思是Godfather(因为我所在国家的IP,我得到了斯洛伐克语版本的HTML(。

最新更新