如何从 div 类中提取所有'href'标签?



我正在使用漂亮的汤处理Python web报废。我无法提取"href"标记并将其保存到列表中。

[<div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/358/40440">React JS Developer</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/391/40439">React.js Developers</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/358/40438">Node JS Developer</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/358/40437">React Native Developer</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/391/40436">Python Full Stack Developers</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/469/40435">SEO SPECIALIST</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/469/40434">VR Designer</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/469/40433">Automation Tester</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/403/40432">Junior Frontend Developer (ReactJS)</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/209/40431">Software Test Engineer</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/249/40430">Senior UI Developer (React )</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/249/40429">UI/UX Senior architect</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/209/40428">Front End Developer</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/249/40427">Senior ADF &amp; Power BI Expert</a></div>, <div 
class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/209/40426">Software Engineer/ Senior Software Engineer - Odoo</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/209/40425">Team Lead - Php</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/209/40424">Lead - Automation Testing</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/209/40423">Software Engineer/ Senior Software Engineer - PHP</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/209/40422">Software Engineer/Senior Software Engineer - Moodle</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/209/40421">Senior Software Engineer - React Native</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/209/40420">Team Lead-React</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/159/40419">iOS Developers</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/159/40418">Flutter Developers</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/159/40417">React Native Developers</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/159/40416">Full Stack Developers</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/159/40415">Angular Developers</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/159/40414">React JS Developers</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/159/40413">Node JS Developers</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/159/40412">Junior Business Development Executive</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/209/40411">Team Lead - Node</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/209/40410">Team Lead- Ruby on Rails</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/304/40409">Facility in Charge</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/209/40408">Software Engineer/ Senior Software Engineer - Vue</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/209/40407">Lead- UI/UX Designer</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/209/40406">Software Engineer/ Senior Software Engineer - React 
Js</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/209/40405">Software Engineer/ Senior Software Engineer - Node </a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/209/40404">Team Lead - GoLang</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/209/40403">Associate - Business Development.</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/209/40402">Team Lead - React Native</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/209/40401">Lead - Business Analyst</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/209/40400">Software Engineer/Sr.Software Engineer- IOS</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/209/40399">Assistant Manager – Business Development </a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/209/40398">Marketing Executive</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/209/40397">Associate Software Engineer-Freshers</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/209/40396">Software Engineer/Senior Software Engineer-Ruby on Rails</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/209/40395">Senior Software Engineer/ Team Lead - Magento</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/209/40394">Software Engineer/Senior Software Engineer - Java</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/290/40393">SENIOR FULL STACK ENIGNEER - NODEJS/REACTJS</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/290/40392">Front End Developer – ReactJS</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/209/40391">Software Engineer/Senior Software Engineer - .Net</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/262/40390">Administration Executive</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/258/40389">HR Manager</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/258/40388">MAGENTO DEVELOPER</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/258/40387">PHP DEVELOPER(1-3 YEAR EXP)</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/258/40386">SERVER ADMINISTRATOR (1- 4 YEAR ) WITH LINUX EXPERIENCE</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/258/40385">Server Admin- Fresher</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/258/40384">TESTING ENGINEER - TEAM LEAD (4+ Years)</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/258/40383">TESTING ENGINEER (2 
- 5 YEARS EXP)</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/258/40382">PHP DEVELOPER (3+ YEAR)</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/258/40381">LARAVEL DEVELOPER (URGENT REQUIREMENT)</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/258/40380">REACT JS DEVELOPER (URGENT REQUIREMENT)</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/258/40379">UI DEVELOPER (0-2 YEAR)</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/258/40378">UI Developer(Minimum 1 year)</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/258/40377">UI DEVELOPER (URGENT REQUIREMENT)</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/258/40376">UI/UX DESIGNER</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/334/40375">Full Stack Developer-Java, Angular, Spring boot, Microservices</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/258/40374">GRAPHIC DESIGNER - TEAM LEAD</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/396/40373">Digital Marketing - Head</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/396/40372">Social Media Marketing Executive</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/258/40371">Content Writer (Urgent requirement)</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/396/40370">Senior React.Js Developer </a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/258/40369">Content Writer - Team Lead (Urgent requirement)</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/258/40368">PERFORMANCE MARKETING (DIGITAL MARKETING)</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/258/40367">Social Media Executive (1 - 3+ Years</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/258/40366">SEO ANALYST (0-1 Year)</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/258/40365">SEO ANALYST</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/258/40364">REACT NATIVE DEVELOPER(URGENT REQUIREMENT)</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/258/40363">Magento Developers(Min 2 year exp)</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/258/40362">ANDROID DEVELOPER (URGENT REQUIREMENT)</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/161/40361">Talent Acquisition Specialist</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/161/40360">Cloud Specialist</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/161/40359">Software Engineer</a></div>, <div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/com

我尝试了下面的代码,但没有成功。

res = requests.get(website_url, verify=False)
soup = BeautifulSoup(res.text, 'lxml')
Links = soup.find_all("div", {'class': 'col-xs-6 col-md-4 mt5'},)
url = [tag.get('href') for tag in Links]

您需要使用finddiv标签中找到a标签,然后从中获得href标签的值-

from bs4 import BeautifulSoup
html = '<html><body><div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/358/40440">React JS Developer</a></div></body></html>'
soup = BeautifulSoup(html, 'lxml')
Links = soup.find_all("div", {'class': 'col-xs-6 col-md-4 mt5'},)
urls = [tag.find('a')['href'] for tag in Links]
>>> Links
[<div class="col-xs-6 col-md-4 mt5"><a href="https://example.com/companies/company/company-jobs/358/40440">React JS Developer</a></div>]
>>> urls
['https://example.com/companies/company/company-jobs/358/40440']

您可以简单地使用BeutigulSoup并查找ahref:

from bs4 import BeautifulSoup
import requests
res = requests.get(website_url)
soup = BeautifulSoup(response.content, features="lxml")
urls = []
for a in soup.find_all('a', href=True):
urls.append(a['href'])

最新更新