BeautifulSoup不返回,这为find_all()和/或find()都提供了错误



我正在尝试zillow已售出房屋的价格,下面是我的尝试:

import requests
from bs4 import BeautifulSoup 
headers = {
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9',
'cache-control': 'max-age=0',
'cookie': '_ga=GA1.2.1673152744.1651941872; zjs_user_id=null; zg_anonymous_id=%2268ca4597-57e2-4569-8b51-b4d390baabfd%22; zjs_anonymous_id=%22313d303c-4a98-4878-a4c0-4ba758bf85cb%22; _gcl_au=1.1.2045578279.1651941873; _pxvid=f8a14e39-ce24-11ec-b5d5-614e7a4a506c; _fbp=fb.1.1651941873087.720309583; __pdst=44605a1f38874dd296884caee932b53e; _cs_c=0; _pin_unauth=dWlkPU16UXpOREZqTURjdE4yTXhNUzAwT0dGbUxUazVZbVV0TURFME16UXhZekJsWldVeA; _gac_UA-21174015-56=1.1652736155.CjwKCAjw7IeUBhBbEiwADhiEMWQrrMU1bgTf5w5pLscs6beK6W4Z5ZMnpvg1tCQL2O4ELmoaldWpsRoC-pIQAvD_BwE; _gcl_aw=GCL.1652736156.CjwKCAjw7IeUBhBbEiwADhiEMWQrrMU1bgTf5w5pLscs6beK6W4Z5ZMnpvg1tCQL2O4ELmoaldWpsRoC-pIQAvD_BwE; zguid=24|%24313d303c-4a98-4878-a4c0-4ba758bf85cb; zgsession=1|49bf6394-c128-46ed-b60c-f44845eee0a4; pxcts=f2ff6767-e881-11ec-8d10-7654455a784e; DoubleClickSession=true; G_ENABLED_IDPS=google; g_state={"i_p":1659752164993,"i_l":4}; KruxPixel=true; KruxAddition=true; _gid=GA1.2.156427582.1658955938; _gat=1; _pxff_bsco=1; _pxff_tm=1; _hp2_id.1215457233=%7B%22userId%22%3A%227482855519340279%22%2C%22pageviewId%22%3A%224371827357917612%22%2C%22sessionId%22%3A%224330573612186749%22%2C%22identity%22%3A%22313d303c4a984878a4c04ba758bf85cb%22%2C%22trackerVersion%22%3A%224.0%22%2C%22identityField%22%3Anull%2C%22isIdentified%22%3A1%7D; _cs_id=9a93c00e-66bc-a34d-9080-f8cd9c853b2e.1651941873.9.1658955940.1658955940.1.1686105873464; _hp2_ses_props.1215457233=%7B%22ts%22%3A1658955939824%2C%22d%22%3A%22www.zillow.com%22%2C%22h%22%3A%22%2F%22%7D; _clck=1k688ws|1|f3i|0; _cs_s=1.5.0.1658957740562; JSESSIONID=37F330B360B367C5078D1CE3FF752613; utag_main=v_id:01809f68962500ab9295fcec46b00507a00e607200ac8$_sn:22$_se:1$_ss:1$_st:1658957739912$dc_visit:19$ses_id:1658955939912%3Bexp-session$_pn:1%3Bexp-session$dcsyncran:1%3Bexp-session$tdsyncran:1%3Bexp-session$dc_event:1%3Bexp-session$dc_region:us-west-2%3Bexp-session$ttd_uuid:723f3b30-f651-4ab7-912e-0dafadc410cc%3Bexp-session; _px3=d6eb87ce838da8d0f897420181a1a09b93bf7e2e5d5a9d5dfabd24c4f692ea71:dWq5/77J71LgkUsG5d/LHg0weAN9ckAV4QNy49z45HTlSZSC5vXbFwcCCQ1Zyrd5Oq4NlWqNJ7ENbBbVCZBybA==:1000:5YQlounZImHX7tzdjZtU5eX+eO9iMxj+3TIPGnXBEN1ruG7BTio/CkmZPae4Ao4nTLxdH2fy6ib3Wk8vaRC8idutWcjmk/Jq2GFdXMM/XfDPy/NduJwHvPxvMFSKpAfjTP3ft8ov/3Q51GG95xdb76C1nRlJMzk07PXorzo7fenKdi54T+i39o6jivsYkqC8oNfILDufpVp0Ysc13Q8kbw==; _uetsid=de540c900def11ed8ba92b524accb8aa; _uetvid=3adaf020bfc911eb8f701323ea2a2b57; AWSALB=WLxAcs6r29liHnv/H9IcuoLOTwb14rPMHRpRfmgrNVjAG19aFtvLCBgTMRPBWtN2HTGmmm4UQn3HkwtALP77x+MNqugTlablwPHxlXf6F8vLPvY+IT+YRVRNYGPj; AWSALBCORS=WLxAcs6r29liHnv/H9IcuoLOTwb14rPMHRpRfmgrNVjAG19aFtvLCBgTMRPBWtN2HTGmmm4UQn3HkwtALP77x+MNqugTlablwPHxlXf6F8vLPvY+IT+YRVRNYGPj; search=6|1661547971145%7Crect%3D37.857199931363425%252C-122.28295413378906%252C37.693293130474025%252C-122.58370486621094%26rid%3D20330%26disp%3Dmap%26mdm%3Dauto%26p%3D1%26z%3D1%26fs%3D0%26fr%3D0%26mmm%3D0%26rs%3D1%26ah%3D0%26singlestory%3D0%26housing-connector%3D0%26abo%3D0%26garage%3D0%26pool%3D0%26ac%3D0%26waterfront%3D0%26finished%3D0%26unfinished%3D0%26cityview%3D0%26mountainview%3D0%26parkview%3D0%26waterview%3D0%26hoadata%3D1%26zillow-owned%3D0%263dhome%3D0%26featuredMultiFamilyBuilding%3D0%09%0920330%09%09%09%09%09%09; _clsk=1gwrgb2|1658955971691|3|0|d.clarity.ms/collect',
'sec-ch-ua': '".Not/A)Brand";v="99", "Google Chrome";v="103", "Chromium";v="103"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"macOS"',
'sec-fetch-dest': 'document',
'sec-fetch-mode': 'navigate',
'sec-fetch-site': 'none',
'sec-fetch-user': '?1',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36'
}

params = {
'searchQueryState': '{"pagination":{},"usersSearchTerm":"San Francisco, CA","mapBounds":{"west":-122.58370486621094,"east":-122.28295413378906,"south":37.693293130474025,"north":37.857199931363425},"regionSelection":[{"regionId":20330,"regionType":6}],"isMapVisible":false,"filterState":{"sort":{"value":"globalrelevanceex"},"fsba":{"value":false},"fsbo":{"value":false},"nc":{"value":false},"fore":{"value":false},"cmsn":{"value":false},"auc":{"value":false},"rs":{"value":true},"ah":{"value":true}},"isListVisible":true,"mapZoom":12}'
}
url = 'https://www.zillow.com/san-francisco-ca/sold'
houses = requests.get(url, headers=headers, params = params)
houses_cards = BeautifulSoup(houses.text)
cards = houses_cards.find_all('ul', {'class':'List-c11n-8-69-2__sc-1smrmqp-0'})
cards.find_all('article', {'class': 'list-card list-card-additional-attribution'})

在调整最后一行后,我得到了以下错误:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/var/folders/fd/72yblq016950hs5y1xzckf_c0000gn/T/ipykernel_16240/2960189247.py in <module>
----> 1 cards.find_all('article', {'class': 'list-card list-card-additional-attribution'})
/usr/local/Caskroom/miniconda/base/lib/python3.7/site-packages/bs4/element.py in __getattr__(self, key)
2288         """Raise a helpful exception to explain a common code fix."""
2289         raise AttributeError(
-> 2290             "ResultSet object has no attribute '%s'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?" % key
2291         )
AttributeError: ResultSet object has no attribute 'find_all'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?

如果我运行cards.find('article', {'class': 'list-card list-card-additional-attribution'}),我会得到以下错误:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/var/folders/fd/72yblq016950hs5y1xzckf_c0000gn/T/ipykernel_16240/3133868743.py in <module>
----> 1 cards.find('article', {'class': 'list-card list-card-additional-attribution'})
/usr/local/Caskroom/miniconda/base/lib/python3.7/site-packages/bs4/element.py in __getattr__(self, key)
2288         """Raise a helpful exception to explain a common code fix."""
2289         raise AttributeError(
-> 2290             "ResultSet object has no attribute '%s'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?" % key
2291         )
AttributeError: ResultSet object has no attribute 'find'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?

如果没有正确的html内容选择,您无法从网页中抓取所有数据项,您将只获得6到8项,因为内容在脚本标记内的html注释下。请逐步查看证明。

bs4示例:

import requests
from bs4 import BeautifulSoup 
headers = {
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9',
'cache-control': 'max-age=0',
'cookie': '_ga=GA1.2.1673152744.1651941872; zjs_user_id=null; zg_anonymous_id=%2268ca4597-57e2-4569-8b51-b4d390baabfd%22; zjs_anonymous_id=%22313d303c-4a98-4878-a4c0-4ba758bf85cb%22; _gcl_au=1.1.2045578279.1651941873; _pxvid=f8a14e39-ce24-11ec-b5d5-614e7a4a506c; _fbp=fb.1.1651941873087.720309583; __pdst=44605a1f38874dd296884caee932b53e; _cs_c=0; _pin_unauth=dWlkPU16UXpOREZqTURjdE4yTXhNUzAwT0dGbUxUazVZbVV0TURFME16UXhZekJsWldVeA; _gac_UA-21174015-56=1.1652736155.CjwKCAjw7IeUBhBbEiwADhiEMWQrrMU1bgTf5w5pLscs6beK6W4Z5ZMnpvg1tCQL2O4ELmoaldWpsRoC-pIQAvD_BwE; _gcl_aw=GCL.1652736156.CjwKCAjw7IeUBhBbEiwADhiEMWQrrMU1bgTf5w5pLscs6beK6W4Z5ZMnpvg1tCQL2O4ELmoaldWpsRoC-pIQAvD_BwE; zguid=24|%24313d303c-4a98-4878-a4c0-4ba758bf85cb; zgsession=1|49bf6394-c128-46ed-b60c-f44845eee0a4; pxcts=f2ff6767-e881-11ec-8d10-7654455a784e; DoubleClickSession=true; G_ENABLED_IDPS=google; g_state={"i_p":1659752164993,"i_l":4}; KruxPixel=true; KruxAddition=true; _gid=GA1.2.156427582.1658955938; _gat=1; _pxff_bsco=1; _pxff_tm=1; _hp2_id.1215457233=%7B%22userId%22%3A%227482855519340279%22%2C%22pageviewId%22%3A%224371827357917612%22%2C%22sessionId%22%3A%224330573612186749%22%2C%22identity%22%3A%22313d303c4a984878a4c04ba758bf85cb%22%2C%22trackerVersion%22%3A%224.0%22%2C%22identityField%22%3Anull%2C%22isIdentified%22%3A1%7D; _cs_id=9a93c00e-66bc-a34d-9080-f8cd9c853b2e.1651941873.9.1658955940.1658955940.1.1686105873464; _hp2_ses_props.1215457233=%7B%22ts%22%3A1658955939824%2C%22d%22%3A%22www.zillow.com%22%2C%22h%22%3A%22%2F%22%7D; _clck=1k688ws|1|f3i|0; _cs_s=1.5.0.1658957740562; JSESSIONID=37F330B360B367C5078D1CE3FF752613; utag_main=v_id:01809f68962500ab9295fcec46b00507a00e607200ac8$_sn:22$_se:1$_ss:1$_st:1658957739912$dc_visit:19$ses_id:1658955939912%3Bexp-session$_pn:1%3Bexp-session$dcsyncran:1%3Bexp-session$tdsyncran:1%3Bexp-session$dc_event:1%3Bexp-session$dc_region:us-west-2%3Bexp-session$ttd_uuid:723f3b30-f651-4ab7-912e-0dafadc410cc%3Bexp-session; _px3=d6eb87ce838da8d0f897420181a1a09b93bf7e2e5d5a9d5dfabd24c4f692ea71:dWq5/77J71LgkUsG5d/LHg0weAN9ckAV4QNy49z45HTlSZSC5vXbFwcCCQ1Zyrd5Oq4NlWqNJ7ENbBbVCZBybA==:1000:5YQlounZImHX7tzdjZtU5eX+eO9iMxj+3TIPGnXBEN1ruG7BTio/CkmZPae4Ao4nTLxdH2fy6ib3Wk8vaRC8idutWcjmk/Jq2GFdXMM/XfDPy/NduJwHvPxvMFSKpAfjTP3ft8ov/3Q51GG95xdb76C1nRlJMzk07PXorzo7fenKdi54T+i39o6jivsYkqC8oNfILDufpVp0Ysc13Q8kbw==; _uetsid=de540c900def11ed8ba92b524accb8aa; _uetvid=3adaf020bfc911eb8f701323ea2a2b57; AWSALB=WLxAcs6r29liHnv/H9IcuoLOTwb14rPMHRpRfmgrNVjAG19aFtvLCBgTMRPBWtN2HTGmmm4UQn3HkwtALP77x+MNqugTlablwPHxlXf6F8vLPvY+IT+YRVRNYGPj; AWSALBCORS=WLxAcs6r29liHnv/H9IcuoLOTwb14rPMHRpRfmgrNVjAG19aFtvLCBgTMRPBWtN2HTGmmm4UQn3HkwtALP77x+MNqugTlablwPHxlXf6F8vLPvY+IT+YRVRNYGPj; search=6|1661547971145%7Crect%3D37.857199931363425%252C-122.28295413378906%252C37.693293130474025%252C-122.58370486621094%26rid%3D20330%26disp%3Dmap%26mdm%3Dauto%26p%3D1%26z%3D1%26fs%3D0%26fr%3D0%26mmm%3D0%26rs%3D1%26ah%3D0%26singlestory%3D0%26housing-connector%3D0%26abo%3D0%26garage%3D0%26pool%3D0%26ac%3D0%26waterfront%3D0%26finished%3D0%26unfinished%3D0%26cityview%3D0%26mountainview%3D0%26parkview%3D0%26waterview%3D0%26hoadata%3D1%26zillow-owned%3D0%263dhome%3D0%26featuredMultiFamilyBuilding%3D0%09%0920330%09%09%09%09%09%09; _clsk=1gwrgb2|1658955971691|3|0|d.clarity.ms/collect',
'sec-ch-ua': '".Not/A)Brand";v="99", "Google Chrome";v="103", "Chromium";v="103"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"macOS"',
'sec-fetch-dest': 'document',
'sec-fetch-mode': 'navigate',
'sec-fetch-site': 'none',
'sec-fetch-user': '?1',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36'
}

params = {
'searchQueryState': '{"pagination":{},"usersSearchTerm":"San Francisco, CA","mapBounds":{"west":-122.58370486621094,"east":-122.28295413378906,"south":37.693293130474025,"north":37.857199931363425},"regionSelection":[{"regionId":20330,"regionType":6}],"isMapVisible":false,"filterState":{"sort":{"value":"globalrelevanceex"},"fsba":{"value":false},"fsbo":{"value":false},"nc":{"value":false},"fore":{"value":false},"cmsn":{"value":false},"auc":{"value":false},"rs":{"value":true},"ah":{"value":true}},"isListVisible":true,"mapZoom":12}'
}
url = 'https://www.zillow.com/san-francisco-ca/sold'
houses = requests.get(url, headers=headers, params = params)
houses_cards = BeautifulSoup(houses.text,'lxml')
card = houses_cards.find('ul', {'class':'List-c11n-8-69-2__sc-1smrmqp-0'})
cards=card.find_all('li', {'class': 'ListItem-c11n-8-69-2__sc-10e22w8-0 srp__hpnp3q-0 enEXBq'})
for item in cards:
price= item.select_one('div[class="StyledPropertyCardDataArea-c11n-8-69-2__sc-yipmu-0 kJFQQX"] span')
price=price.get_text(strip=True) if price else None
print(price)

输出:

$1.72M
$1.31M
$2.66M
$1.90M
$2.05M
$4.50M
$515,000
$1.38M
$1.12M
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None

来自脚本标记:

import requests
import re
import json
r = requests.get('https://www.zillow.com/san-francisco-ca/sold/1_p/?searchQueryState=%7B%22pagination%22%3A%7B%22currentPage%22%3A2%7D%2C%22usersSearchTerm%22%3A%22San%20Francisco%2C%20CA%22%2C%22mapBounds%22%3A%7B%22west%22%3A-123.05611697558594%2C%22east%22%3A-122.21016970996094%2C%22south%22%3A37.416218151120056%2C%22north%22%3A37.948763955871286%7D%2C%22regionSelection%22%3A%5B%7B%22regionId%22%3A20330%2C%22regionType%22%3A6%7D%5D%2C%22isMapVisible%22%3Afalse%2C%22filterState%22%3A%7B%22sort%22%3A%7B%22value%22%3A%22globalrelevanceex%22%7D%2C%22fsba%22%3A%7B%22value%22%3Afalse%7D%2C%22fsbo%22%3A%7B%22value%22%3Afalse%7D%2C%22nc%22%3A%7B%22value%22%3Afalse%7D%2C%22fore%22%3A%7B%22value%22%3Afalse%7D%2C%22cmsn%22%3A%7B%22value%22%3Afalse%7D%2C%22auc%22%3A%7B%22value%22%3Afalse%7D%2C%22rs%22%3A%7B%22value%22%3Atrue%7D%2C%22ah%22%3A%7B%22value%22%3Atrue%7D%7D%2C%22isListVisible%22%3Atrue%7D',headers = {'User-Agent':'Mozilla/5.0'})
data = json.loads(re.search(r'!--({"queryState".*?)-->', r.text).group(1))
for item in data['cat1']['searchResults']['listResults']:
price= item['soldPrice'] 
print(price)

输出:

$1.72M
$1.31M
$2.66M
$1.90M
$2.05M
$4.50M
$515,000
$1.38M
$1.12M
$3.42M
$1.66M
$2.42M
$1.40M
$1.50M
$2.12M
$635,000
$1.40M
$1.02M
$9.81M
$1.20M
$509,000
$1.16M
$2.22M
$1.16M
$1.30M
$1.05M
$1.46M
$1.12M
$1.54M
$1.72M
$1.50M
$1.90M
$1.11M
$875,000
$540,000
$1.20M
$2.90M
$2.08M
$1.68M
$2.00M

在您的代码中,cards是一个bs4.element.ResultSet对象,它没有与之关联的find_all()find()属性。这些属性只存在于bs4.BeautifulSoup对象,如houses_cards,因此会出现错误。

如果要从cards的值中筛选/搜索,则可以对其进行迭代并相应地选择值。

否则,请尝试以下操作:

houses_cards = BeautifulSoup(houses.text)
cards_ul = houses_cards.find_all('ul', {'class':'List-c11n-8-69-2__sc-1smrmqp-0'})
cards_article = houses_cards.find_all('article', {'class': 'list-card list-card-additional-attribution'})

.find()/.find_all()只能在单个项目上工作,但.find_all()提供了许多项目,这需要使用for循环来单独在每个项目上运行。

cards = houses_cards.find_all(...)
#all_results = []
for item in cards:
results = item.find_all('article', {'class': 'list-card list-card-additional-attribution'})
#all_results += results

相关内容

  • 没有找到相关文章

最新更新