蟒蛇的美味汤网页抓取



大家好,我试图从网站获取数据,但问题是我不知道如何获取 href 链接,我的意思是我可以通过说 jobs= jobs.text 来获取文本,但我该如何为 href 链接做到这一点 这是代码(你不需要检查所有代码,你可以检查List4)

from ctypes.wintypes import tagRECT
from traceback import print_tb
from turtle import clear
import requests
from bs4 import BeautifulSoup
Jobs_Name_List =list()
Jobs_Description=list()
Job_Company=list()
jobs_link=list()
url = ("https://www.seek.co.nz/jobs? onsite_campaign=TATSOI_TGJB_Aware&onsite_content=TATSOI_TGJB_Aware_CANZ_AW_OS_Ban_Half01A&onsite_medium=Display&onsite_source=SEEK&tracking=SEK-SNZ-BAN-TATSOI_TGJB_Aware-30419")
R = requests.get(url)
Soup = BeautifulSoup(R.content, "html5lib")
List = Soup.find_all("h3", attrs={"class":"yvsb870 _1qw3t4i0 _1qw3t4ih _1d0g9qk4 _1qw3t4ip _1qw3t4i1x"})
for jobs in List:
jobs = jobs.text
if jobs not in Jobs_Name_List:
Jobs_Name_List.append(jobs)
print(Jobs_Name_List)
print("--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------")
List2= Soup.find_all("span", attrs={"class":"yvsb870 _14uh9944u _1qw3t4i0 _1qw3t4i1x _1qw3t4i2 _1d0g9qk4 _1qw3t4ie"})
for companies in List2:
companies = companies.text
if companies not in Job_Company:
Job_Company.append(companies)
print(Job_Company)
print("-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------")
List3 = Soup.find_all("span", attrs={"class":"yvsb870 _14uh9944u _1qw3t4i0 _1qw3t4i1y _1qw3t4i1 _1d0g9qk4 _1qw3t4i8"})
for descriptions in List3:
descriptions = descriptions.text
if descriptions not in Jobs_Description:
Jobs_Description.append(descriptions)
print(Jobs_Description)
List4 = Soup.find_all("a", attrs={"href"})

这是我需要将 href 链接到Jobs_link列表的 HTML 代码

这会将所有带有锚标记的文本"接待员/管理员"的href值附加到List5中。

List4 = Soup.find_all("a")
List5 = []
for a in List4:
if 'href' in a.attrs and a.text=="Receptionist/Administrator":
link = a.get('href')
List5.append(link)

这将仅抓取值中type=promoted的链接。


List4 = Soup.find_all("a")
List5 = []
for a in List4:
if 'href' in a.attrs:
if "type=promoted" in a.attrs['href']:
link = a.get('href')
List5.append(link)

最新更新