从网站提取数据的代码存在问题



我有这个网站,我想通过Python提取所有公司名称,如West Wood EventsMitchell Event Planning

但我被困在soup.find上,因为它导致了我的[]。当我检查页面时,让我们这样说:

< div class="LinesEllipsis  vendor-name--55315 primaryBold--a3d1e body1--24afd">Mitchell Event Planning<wbr></div >

我会写:

week = soup.find(class_='LinesEllipsis  vendor-name--55315 primaryBold--a3d1e body1--24afd')
print(week)

我得到0

我是不是错过了什么?我对此很陌生。

这个字符串不是单个类,而是用空格分隔的多个类。

在一些模块中,你必须使用所有空格的原始字符串,但在BS中,你似乎必须使用由单个空格分隔的类。


如果我在LinesEllipsisvendor-name--55315之间使用单个空格,则代码对我有效。

week = soup.find_all(class_='LinesEllipsis vendor-name--55315 primaryBold--a3d1e body1--24afd')

或者,如果我对字符串中的每个类使用带点的CSS选择器

week = soup.select('.LinesEllipsis.vendor-name--55315.primaryBold--a3d1e.body1--24afd')

最小工作代码

import requests
from bs4 import BeautifulSoup as BS
url = 'https://www.theknot.com/marketplace/wedding-planners-acworth-ga?page=2'
r = requests.get(url)
soup = BS(r.text, 'html.parser')
#week = soup.select('.LinesEllipsis.vendor-name--55315.primaryBold--a3d1e.body1--24afd')
week = soup.find_all(class_='LinesEllipsis vendor-name--55315 primaryBold--a3d1e body1--24afd')
for item in week:
print(item.text)

结果:

The Charming Details
Enraptured Events
pearl and sky events - planning, design and florals
Unique Occasions ByTNicole, Inc
Platinum Eventions
RED COMPANY ATLANTA
Pop + Fizz: Event Planning and Design
Patricia Elizabeth, certified wedding planners/producer
Rienza Events
Pollyanna Richter Weddings
Calabash Events, Inc.
Weddings by Carmona LLC
Emily Jordan Events
Perfectly Taylored Events
Lindsey Wise Designs
Elegant Weddings and Affairs
Party PLANit
Wedded Bliss
Above the Fray
Willow Jaymes Events
Coco Red Events
Liz D. Events, LLC
Leslie Cox Events
YSE LLC
Marmaros Productions
PerfectionsID, LLC
All Things Love
West Wood Events
Everlasting Elegance
Prestigious Occasions

最新更新