试图解析学院大学网站上的名字(和博士学位).很难做到这一点


from bs4 import BeautifulSoup #imports beautifulSoup package
import urllib2
url = 'https://www.marshall.usc.edu/faculty/phd' #sets url to a variable
page = urllib2.urlopen(url)
soup = BeautifulSoup(page.read(), "lxml") #sets the contents of the page to the variable soup
#names = soup.find_all('tr', {'class': 'odd views-row-first'})
names = soup.find_all('td', {'class': 'views-field views-field-field-faculty-name-last-value active'}) #sets the name 'cell' and tags
#namesU = names.replaceAll("<[^>]*>","")
#names.strip('<td class="views-field views-field-field-faculty-name-last-value active">') 
#names2 = names.sub('<td class="views-field views-field-field-faculty-name-last-value active">', '')
print(names)

您可以通过在'td'的find_all之后使用"text"属性来解决这个问题。

因此,您从find_all获得的结果,您只需迭代并获得每个部分的"文本"部分,并将其放入名称数组中。

下面是一个列表推导方法来实现这一点:

names = [i.text.strip() for i in soup.find_all('td', {'class': 'views-field views-field-field-faculty-name-last-value active'})]

运行此命令后,输出结果如下:

['Amato, Andrea', 'Banerjee, Trambak', 'Basu, Pallavi', 'Chang, Wayne', 'Chung, Sung Hun', 'Comings, Alison', 'Cui, Hailong', 'DeGroot, Tyler', 'Dutton, Chaumanix', 'Fu, Luella', 'Golrezaei, Negin', 'Grandy, Jake', 'Han, Rong Qing', 'Han, Ju Rie (Alyssa)', 'Harmon, Derek', 'Hong, Jihoon', 'Jia, He', 'Joshi, Priyanka', 'Kays, Allison', 'Kfir, Alon', 'Kim, Jeunghyun', 'Kim, Pureum', 'Kim, Yookyoung', 'Kim , Jennifer', 'Krikorian, Mariam', 'Lang, Tina', 'Lee, Jennifer', 'Lee, Suk won', 'Lee, Yoonju', 'Li, Guang', 'Li, Yuan', 'Ling, Yun', 'Magkotsios, Georgios', 'Min, Bora', 'Newman, David', 'Oh, Seung Hwan', 'Ozkan, Erhun', 'Paulson, Courtney', 'Pei, Lei', 'Pyun, Sung June', 'Raj, Medha', 'Raveendhran, Roshni', 'Rich, Beverly', 'Ritter, Stacey', 'Sahoo, Satish', 'Skripnik, Roman', 'Smallets, Stephanie', 'Song, Shiwon', 'Stamenov, Ventsislav', 'Subler, Megan', 'Talijan, Vuk', 'Uhalde, Arianna', 'Valsesia, Francesca', 'Wan, Yuan', 'Wang, Jue', 'Wang, Weinan', 'Wang, Xuan', 'Wang, Yongzhi (Alex)', 'Wang, Yingfei (Fiona)', 'Wong, Vivian', 'Xia, Jingjing', 'Xing, Zhe (Adele)', 'Xu, Zibin', 'Yang, Louis', 'Yao, Yao', 'Yi, Irene', 'Yordanov, Kristian', 'Yu, Xiaoqian', 'Zhang, Heng', 'Zhang, Yanwei (Wayne)', 'Zhang, Yingguang', 'Zhang, Mengxia']

最新更新