from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as uReq
import pandas as pd
sca_url = "https://steakcookoffs.com/cookoffs?EventViewMode=1&EventListViewMode=1"
client = uReq(sca_url)
page_html = client.read()
page_soup = soup(page_html, features='lxml')
sca_reg_links_tags = page_soup.select(".inner a")
print(sca_reg_links_tags)
我怎样才能获得注册链接??我还尝试过使用sca_reg_links_tags=page_soup.find('div',{"class":"inner"}(但它会得到同样的东西。
尝试:
soup.find_all("a", string="Register")
此外,bs4文档:
https://www.crummy.com/software/BeautifulSoup/bs4/doc/
尝试如下:
sca_reg_links_tags = page_soup.find_all('a', {'title': 'View event details'})
lst = []
for link in sca_reg_links_tags:
lst.append(link['href']+'/Registration')
lst[:5]
['https://steakcookoffs.com/event-4572070/Registration',
'https://steakcookoffs.com/event-4572070/Registration',
'https://steakcookoffs.com/event-4692176/Registration',
'https://steakcookoffs.com/event-4692176/Registration',
'https://steakcookoffs.com/event-4901583/Registration']
烹饪愉快!