美丽汤如何从.text创建列表



我正在使用漂亮的汤从这个网址中抓取一些信息,但我发现它相当令人困惑。

法典:

page = requests.get(url, headers=header)
soup = BeautifulSoup(page.content, 'html.parser')

section = soup.find_all("article", {"class" : re.compile('results-card residential-card residential-card--compressed-view*')})
for advert in section:
print("{}nn".format(advert))
# print("{}nn".format(advert.text)) # Not the desired output, but very close

输出:

advert的 HTML 代码片段:

<article aria-label="13 Wellington Road, Auburn" class="results-card residential-card residential-card--compressed-view sc-cHSUfg dzuxEF" data-testid="ResidentialCard"><div class="branding branding--small " style="background-color:#00011b"><img alt="McGrath - Parramatta" class="branding__image" src="data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSIxNzAiIGhlaWdodD0iMzIiPjwvc3ZnPgo="/></div><div aria-hidden="true" class="residential-card__image-wrapper"><div class="residential-card__image"><a class="details-link " href="/property-house-nsw-auburn-132520446"><div class="carousel carousel--unmounted residential-card__images property-card-hero property-card-hero--small" data-testid="Carousel"><div class="property-image" data-testid="PropertyImage"><img alt="13 Wellington Road, Auburn, NSW 2144" class="property-image__img " src="data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSI4MDAiIGhlaWdodD0iNjAwIj48L3N2Zz4K"/></div><button aria-label="previous image" class="carousel__left" data-carousel-previous="true" data-testid="Carousel__previous"></button><button aria-label="next image" class="carousel__right" data-carousel-next="true" data-testid="Carousel__next"></button></div></a></div></div><div class="residential-card__banner-strip" role="presentation"></div><div class="residential-card__content-wrapper" role="presentation"><div class="residential-card__content" role="presentation"><div><div class="residential-card__price rui-truncate" role="presentation"><span class="property-price ">$1,300,000</span></div><div><h2 class="residential-card__address-heading"><a class="details-link residential-card__details-link" href="/property-house-nsw-auburn-132520446"><span class="">13 Wellington Road, Auburn</span></a></h2></div></div><div class="piped-content"><div class="piped-content__outer"><div class="piped-content__inner"><div class="primary-features residential-card__primary"><ul class="general-features rui-clearfix " role="presentation"><li aria-label="7 bedrooms" class="general-features__feature" role="text"><span class="general-features__icon general-features__beds"> <!-- -->7</span></li><li aria-label="3 bathrooms" class="general-features__feature" role="text"><span class="general-features__icon general-features__baths"> <!-- -->3</span></li><li aria-label="3 parking spaces" class="general-features__feature" role="text"><span class="general-features__icon general-features__cars"> <!-- -->3</span></li></ul><div aria-label="490 m² land size" class="property-size rui-clearfix" role="text"><span aria-hidden="true" class="property-size__icon property-size__land"> <!-- -->490</span><span aria-hidden="true"> <!-- -->m²</span></div></div></div><div class="piped-content__inner"><span aria-label="House property type" class="residential-card__property-type" role="text">House</span></div></div></div></div><div class="residential-card__buttons" role="presentation"><button aria-label="Save property" class="listing-bookmark listing-bookmark--search-results" title="Save property"><div class="save_icon "><span class="save_icon__hollow-star"></span><span class="save_icon__filled-star"></span></div></button></div></div></article>

电流输出:

如果我打印advert.text,则从输出中得到以下内容:

$1,300,00013 Wellington Road, Auburn 7 3 3 490 m²House

然而,这有点难以在以后分析,因为大多数广告的结构并不相似,所以我最好有一个列表,以便我可以进一步处理。

for回路的完整输出:

$1,300,00013 Wellington Road, Auburn 7 3 3 490 m²House
For Sale $985,00045 Raglan Road, Auburn 4 2 2HouseOpen Sat 25 JanOpen Sat 25 Jan 10:00am
For Sale20 Kirkham Road, Auburn 4 2 2House
$1,120,00099 Park Road, Auburn 4 2 2House
auction12 Dudley Street, Auburn 5 2 2 708 m²HouseOpen Sat 25 JanOpen Sat 25 Jan 2:00pmAuction Sat 15 Feb
EOI For Sale or LeaseAddress available on request, Auburn 10 6 28 1,561 m²House
Contact Agent50 Chiswick Road, Auburn 5 3House
1,150,000 - 1,200,0009 Norval Street, Auburn 3 1 645 m²House
DA approved for 32 luxury Apartments40 Park Road, Auburn 3 1House
Added 23 hours agoAUCTION 15TH FEBRUARY SATURDAY @ 11.30 AM ONSITE120 Park Road, Auburn 4 2 3HouseOpen Sat 25 JanOpen Sat 25 Jan 11:00amAuction Sat 15 Feb
Added 22 hours agoAUCTION 15TH FEBRUARY SATURDAY @ 12.30 PM ONSITE54 Mary Street, Auburn 3 2 1HouseOpen Sat 25 JanOpen Sat 25 Jan 12:00pmAuction Sat 15 Feb
Under offer1.23 Million138 Chisholm Rd, Auburn 5 3 4 927 m²House
Price Guide: $980,000 to $1,025,000173 Auburn Road, Auburn 4 1 1 436 m²House
Price Guide: $670,000 to $690,00042 Belgium Street, Auburn 3 1 1 366 m²House
$1,200,00017 Beaumont Street, Auburn 6 3 2 607 m²HouseOpen Sat 25 JanOpen Sat 25 Jan 12:00pm
Contact Agent61 Gordon Road, Auburn 5 3 2 512 m²HouseOpen Sat 25 JanOpen Sat 25 Jan 11:00am
Under offerOne left, be quick before all sold72 Wellington Road, Auburn 5 3 1Duplex/Semi-detached
$1,500,0002 North Street, Auburn 8 3 3HouseOpen Sat 25 JanOpen Sat 25 Jan 11:00am
$569,0003/18 Harrow Road, Auburn 2 2 1 216 m²House
$1,975,00019 St Johns Road, Auburn 5 2 1 1,277 m²House
$1,650,00036 Antwerp Street, Auburn 7 5 4 762 m²House
Contact Agent22 Gibbs street, Auburn 5 3 2 450 m²House

理想输出:

["$1,300,000", "13 Wellington Road, Auburn", "7", "3", "3", "490 m²House"]

问题:

如何将advert.text放入理想输出中的列表中?

使用advert.findAll(text=True),您可以创建advert内所有文本的列表。

法典:

for advert in section:
print("{}nn".format(advert.findAll(text=True))

收益 率:

['$1,300,000', '13 Wellington Road, Auburn', ' ', ' ', '7', ' ', ' ', '3', ' ', ' ', '3', 'xa0', ' ', '490', ' ', ' ', 'm²', 'House']

这里还有一个解决方案。

from simplified_scrapy.request import req
from simplified_scrapy.simplified_doc import SimplifiedDoc
uri = 'https://www.realestate.com.au/buy/property-house-in-auburn,+nsw+2144/list-1?source=refinement'
html = req.get(uri)
doc = SimplifiedDoc(html)
articles = doc.getElementsByReg('class="results-card residential-card residential-card--compressed-view.*"')
for article in articles:
div = article.getElementByClass('residential-card__content-wrapper').div
section = [span.text for span in div.spans]
print (section)

结果:

['$1,300,000', '13 Wellington Road, Auburn', '7', '3', '3', '490', 'm²', 'House']
['For Sale $985,000', '45 Raglan Road, Auburn', '4', '2', '2', 'House', 'Open Sat 25 JanOpen Sat 25 Jan 10:00am']
['For Sale', '20 Kirkham Road, Auburn', '4', '2', '2', 'House']
['$1,120,000', '99 Park Road, Auburn', '4', '2', '2', 'House']
['auction', '12 Dudley Street, Auburn', '5', '2', '2', '708', 'm²', 'House', 'Open Sat 25 JanOpen Sat 25 Jan 2:00pm', 'Auction Sat 15 Feb']
['EOI For Sale or Lease', 'Address available on request, Auburn', '10', '6', '28', '1,561', 'm²', 'House']
['Contact Agent', '50 Chiswick Road, Auburn', '5', '3', 'House']
['1,150,000 - 1,200,000', '9 Norval Street, Auburn', '3', '1', '645', 'm²', 'House']
['DA approved for 32 luxury Apartments', '40 Park Road, Auburn', '3', '1', 'House']
['AUCTION 15TH FEBRUARY SATURDAY @ 11.30 AM ONSITE', '120 Park Road, Auburn', '4', '2', '3', 'House', 'Open Sat 25 JanOpen Sat 25 Jan 11:00am', 'Auction Sat 15 Feb']
['AUCTION 15TH FEBRUARY SATURDAY @ 12.30 PM ONSITE', '54 Mary Street, Auburn', '3', '2', '1', 'House', 'Open Sat 25 JanOpen Sat 25 Jan 12:00pm', 'Auction Sat 15 Feb']
['1.23 Million', '138 Chisholm Rd, Auburn', '5', '3', '4', '927', 'm²', 'House']
['Price Guide: $980,000 to $1,025,000', '173 Auburn Road, Auburn', '4', '1', '1', '436', 'm²', 'House']
['Price Guide: $670,000 to $690,000', '42 Belgium Street, Auburn', '3', '1', '1', '366', 'm²', 'House']
['$1,200,000', '17 Beaumont Street, Auburn', '6', '3', '2', '607', 'm²', 'House', 'Open Sat 25 JanOpen Sat 25 Jan 12:00pm']
['Contact Agent', '61 Gordon Road, Auburn', '5', '3', '2', '512', 'm²', 'House', 'Open Sat 25 JanOpen Sat 25 Jan 11:00am']
['One left, be quick before all sold', '72 Wellington Road, Auburn', '5', '3', '1', 'Duplex/Semi-detached']
['$1,500,000', '2 North Street, Auburn', '8', '3', '3', 'House', 'Open Sat 25 JanOpen Sat 25 Jan 11:00am']
['$569,000', '3/18 Harrow Road, Auburn', '2', '2', '1', '216', 'm²', 'House']
['$1,975,000', '19 St Johns Road, Auburn', '5', '2', '1', '1,277', 'm²', 'House']
['$1,650,000', '36 Antwerp Street, Auburn', '7', '5', '4', '762', 'm²', 'House']
['Contact Agent', '22 Gibbs street, Auburn', '5', '3', '2', '450', 'm²', 'House']

最新更新