我正在尝试zillow已售出房屋的价格,下面是我的尝试:
import requests
from bs4 import BeautifulSoup
headers = {
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9',
'cache-control': 'max-age=0',
'cookie': '_ga=GA1.2.1673152744.1651941872; zjs_user_id=null; zg_anonymous_id=%2268ca4597-57e2-4569-8b51-b4d390baabfd%22; zjs_anonymous_id=%22313d303c-4a98-4878-a4c0-4ba758bf85cb%22; _gcl_au=1.1.2045578279.1651941873; _pxvid=f8a14e39-ce24-11ec-b5d5-614e7a4a506c; _fbp=fb.1.1651941873087.720309583; __pdst=44605a1f38874dd296884caee932b53e; _cs_c=0; _pin_unauth=dWlkPU16UXpOREZqTURjdE4yTXhNUzAwT0dGbUxUazVZbVV0TURFME16UXhZekJsWldVeA; _gac_UA-21174015-56=1.1652736155.CjwKCAjw7IeUBhBbEiwADhiEMWQrrMU1bgTf5w5pLscs6beK6W4Z5ZMnpvg1tCQL2O4ELmoaldWpsRoC-pIQAvD_BwE; _gcl_aw=GCL.1652736156.CjwKCAjw7IeUBhBbEiwADhiEMWQrrMU1bgTf5w5pLscs6beK6W4Z5ZMnpvg1tCQL2O4ELmoaldWpsRoC-pIQAvD_BwE; zguid=24|%24313d303c-4a98-4878-a4c0-4ba758bf85cb; zgsession=1|49bf6394-c128-46ed-b60c-f44845eee0a4; pxcts=f2ff6767-e881-11ec-8d10-7654455a784e; DoubleClickSession=true; G_ENABLED_IDPS=google; g_state={"i_p":1659752164993,"i_l":4}; KruxPixel=true; KruxAddition=true; _gid=GA1.2.156427582.1658955938; _gat=1; _pxff_bsco=1; _pxff_tm=1; _hp2_id.1215457233=%7B%22userId%22%3A%227482855519340279%22%2C%22pageviewId%22%3A%224371827357917612%22%2C%22sessionId%22%3A%224330573612186749%22%2C%22identity%22%3A%22313d303c4a984878a4c04ba758bf85cb%22%2C%22trackerVersion%22%3A%224.0%22%2C%22identityField%22%3Anull%2C%22isIdentified%22%3A1%7D; _cs_id=9a93c00e-66bc-a34d-9080-f8cd9c853b2e.1651941873.9.1658955940.1658955940.1.1686105873464; _hp2_ses_props.1215457233=%7B%22ts%22%3A1658955939824%2C%22d%22%3A%22www.zillow.com%22%2C%22h%22%3A%22%2F%22%7D; _clck=1k688ws|1|f3i|0; _cs_s=1.5.0.1658957740562; JSESSIONID=37F330B360B367C5078D1CE3FF752613; utag_main=v_id:01809f68962500ab9295fcec46b00507a00e607200ac8$_sn:22$_se:1$_ss:1$_st:1658957739912$dc_visit:19$ses_id:1658955939912%3Bexp-session$_pn:1%3Bexp-session$dcsyncran:1%3Bexp-session$tdsyncran:1%3Bexp-session$dc_event:1%3Bexp-session$dc_region:us-west-2%3Bexp-session$ttd_uuid:723f3b30-f651-4ab7-912e-0dafadc410cc%3Bexp-session; _px3=d6eb87ce838da8d0f897420181a1a09b93bf7e2e5d5a9d5dfabd24c4f692ea71:dWq5/77J71LgkUsG5d/LHg0weAN9ckAV4QNy49z45HTlSZSC5vXbFwcCCQ1Zyrd5Oq4NlWqNJ7ENbBbVCZBybA==:1000:5YQlounZImHX7tzdjZtU5eX+eO9iMxj+3TIPGnXBEN1ruG7BTio/CkmZPae4Ao4nTLxdH2fy6ib3Wk8vaRC8idutWcjmk/Jq2GFdXMM/XfDPy/NduJwHvPxvMFSKpAfjTP3ft8ov/3Q51GG95xdb76C1nRlJMzk07PXorzo7fenKdi54T+i39o6jivsYkqC8oNfILDufpVp0Ysc13Q8kbw==; _uetsid=de540c900def11ed8ba92b524accb8aa; _uetvid=3adaf020bfc911eb8f701323ea2a2b57; AWSALB=WLxAcs6r29liHnv/H9IcuoLOTwb14rPMHRpRfmgrNVjAG19aFtvLCBgTMRPBWtN2HTGmmm4UQn3HkwtALP77x+MNqugTlablwPHxlXf6F8vLPvY+IT+YRVRNYGPj; AWSALBCORS=WLxAcs6r29liHnv/H9IcuoLOTwb14rPMHRpRfmgrNVjAG19aFtvLCBgTMRPBWtN2HTGmmm4UQn3HkwtALP77x+MNqugTlablwPHxlXf6F8vLPvY+IT+YRVRNYGPj; search=6|1661547971145%7Crect%3D37.857199931363425%252C-122.28295413378906%252C37.693293130474025%252C-122.58370486621094%26rid%3D20330%26disp%3Dmap%26mdm%3Dauto%26p%3D1%26z%3D1%26fs%3D0%26fr%3D0%26mmm%3D0%26rs%3D1%26ah%3D0%26singlestory%3D0%26housing-connector%3D0%26abo%3D0%26garage%3D0%26pool%3D0%26ac%3D0%26waterfront%3D0%26finished%3D0%26unfinished%3D0%26cityview%3D0%26mountainview%3D0%26parkview%3D0%26waterview%3D0%26hoadata%3D1%26zillow-owned%3D0%263dhome%3D0%26featuredMultiFamilyBuilding%3D0%09%0920330%09%09%09%09%09%09; _clsk=1gwrgb2|1658955971691|3|0|d.clarity.ms/collect',
'sec-ch-ua': '".Not/A)Brand";v="99", "Google Chrome";v="103", "Chromium";v="103"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"macOS"',
'sec-fetch-dest': 'document',
'sec-fetch-mode': 'navigate',
'sec-fetch-site': 'none',
'sec-fetch-user': '?1',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36'
}
params = {
'searchQueryState': '{"pagination":{},"usersSearchTerm":"San Francisco, CA","mapBounds":{"west":-122.58370486621094,"east":-122.28295413378906,"south":37.693293130474025,"north":37.857199931363425},"regionSelection":[{"regionId":20330,"regionType":6}],"isMapVisible":false,"filterState":{"sort":{"value":"globalrelevanceex"},"fsba":{"value":false},"fsbo":{"value":false},"nc":{"value":false},"fore":{"value":false},"cmsn":{"value":false},"auc":{"value":false},"rs":{"value":true},"ah":{"value":true}},"isListVisible":true,"mapZoom":12}'
}
url = 'https://www.zillow.com/san-francisco-ca/sold'
houses = requests.get(url, headers=headers, params = params)
houses_cards = BeautifulSoup(houses.text)
cards = houses_cards.find_all('ul', {'class':'List-c11n-8-69-2__sc-1smrmqp-0'})
cards.find_all('article', {'class': 'list-card list-card-additional-attribution'})
在调整最后一行后,我得到了以下错误:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
/var/folders/fd/72yblq016950hs5y1xzckf_c0000gn/T/ipykernel_16240/2960189247.py in <module>
----> 1 cards.find_all('article', {'class': 'list-card list-card-additional-attribution'})
/usr/local/Caskroom/miniconda/base/lib/python3.7/site-packages/bs4/element.py in __getattr__(self, key)
2288 """Raise a helpful exception to explain a common code fix."""
2289 raise AttributeError(
-> 2290 "ResultSet object has no attribute '%s'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?" % key
2291 )
AttributeError: ResultSet object has no attribute 'find_all'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?
如果我运行cards.find('article', {'class': 'list-card list-card-additional-attribution'})
,我会得到以下错误:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
/var/folders/fd/72yblq016950hs5y1xzckf_c0000gn/T/ipykernel_16240/3133868743.py in <module>
----> 1 cards.find('article', {'class': 'list-card list-card-additional-attribution'})
/usr/local/Caskroom/miniconda/base/lib/python3.7/site-packages/bs4/element.py in __getattr__(self, key)
2288 """Raise a helpful exception to explain a common code fix."""
2289 raise AttributeError(
-> 2290 "ResultSet object has no attribute '%s'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?" % key
2291 )
AttributeError: ResultSet object has no attribute 'find'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?
如果没有正确的html内容选择,您无法从网页中抓取所有数据项,您将只获得6到8项,因为内容在脚本标记内的html注释下。请逐步查看证明。
bs4示例:
import requests
from bs4 import BeautifulSoup
headers = {
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9',
'cache-control': 'max-age=0',
'cookie': '_ga=GA1.2.1673152744.1651941872; zjs_user_id=null; zg_anonymous_id=%2268ca4597-57e2-4569-8b51-b4d390baabfd%22; zjs_anonymous_id=%22313d303c-4a98-4878-a4c0-4ba758bf85cb%22; _gcl_au=1.1.2045578279.1651941873; _pxvid=f8a14e39-ce24-11ec-b5d5-614e7a4a506c; _fbp=fb.1.1651941873087.720309583; __pdst=44605a1f38874dd296884caee932b53e; _cs_c=0; _pin_unauth=dWlkPU16UXpOREZqTURjdE4yTXhNUzAwT0dGbUxUazVZbVV0TURFME16UXhZekJsWldVeA; _gac_UA-21174015-56=1.1652736155.CjwKCAjw7IeUBhBbEiwADhiEMWQrrMU1bgTf5w5pLscs6beK6W4Z5ZMnpvg1tCQL2O4ELmoaldWpsRoC-pIQAvD_BwE; _gcl_aw=GCL.1652736156.CjwKCAjw7IeUBhBbEiwADhiEMWQrrMU1bgTf5w5pLscs6beK6W4Z5ZMnpvg1tCQL2O4ELmoaldWpsRoC-pIQAvD_BwE; zguid=24|%24313d303c-4a98-4878-a4c0-4ba758bf85cb; zgsession=1|49bf6394-c128-46ed-b60c-f44845eee0a4; pxcts=f2ff6767-e881-11ec-8d10-7654455a784e; DoubleClickSession=true; G_ENABLED_IDPS=google; g_state={"i_p":1659752164993,"i_l":4}; KruxPixel=true; KruxAddition=true; _gid=GA1.2.156427582.1658955938; _gat=1; _pxff_bsco=1; _pxff_tm=1; _hp2_id.1215457233=%7B%22userId%22%3A%227482855519340279%22%2C%22pageviewId%22%3A%224371827357917612%22%2C%22sessionId%22%3A%224330573612186749%22%2C%22identity%22%3A%22313d303c4a984878a4c04ba758bf85cb%22%2C%22trackerVersion%22%3A%224.0%22%2C%22identityField%22%3Anull%2C%22isIdentified%22%3A1%7D; _cs_id=9a93c00e-66bc-a34d-9080-f8cd9c853b2e.1651941873.9.1658955940.1658955940.1.1686105873464; _hp2_ses_props.1215457233=%7B%22ts%22%3A1658955939824%2C%22d%22%3A%22www.zillow.com%22%2C%22h%22%3A%22%2F%22%7D; _clck=1k688ws|1|f3i|0; _cs_s=1.5.0.1658957740562; JSESSIONID=37F330B360B367C5078D1CE3FF752613; utag_main=v_id:01809f68962500ab9295fcec46b00507a00e607200ac8$_sn:22$_se:1$_ss:1$_st:1658957739912$dc_visit:19$ses_id:1658955939912%3Bexp-session$_pn:1%3Bexp-session$dcsyncran:1%3Bexp-session$tdsyncran:1%3Bexp-session$dc_event:1%3Bexp-session$dc_region:us-west-2%3Bexp-session$ttd_uuid:723f3b30-f651-4ab7-912e-0dafadc410cc%3Bexp-session; _px3=d6eb87ce838da8d0f897420181a1a09b93bf7e2e5d5a9d5dfabd24c4f692ea71:dWq5/77J71LgkUsG5d/LHg0weAN9ckAV4QNy49z45HTlSZSC5vXbFwcCCQ1Zyrd5Oq4NlWqNJ7ENbBbVCZBybA==:1000:5YQlounZImHX7tzdjZtU5eX+eO9iMxj+3TIPGnXBEN1ruG7BTio/CkmZPae4Ao4nTLxdH2fy6ib3Wk8vaRC8idutWcjmk/Jq2GFdXMM/XfDPy/NduJwHvPxvMFSKpAfjTP3ft8ov/3Q51GG95xdb76C1nRlJMzk07PXorzo7fenKdi54T+i39o6jivsYkqC8oNfILDufpVp0Ysc13Q8kbw==; _uetsid=de540c900def11ed8ba92b524accb8aa; _uetvid=3adaf020bfc911eb8f701323ea2a2b57; AWSALB=WLxAcs6r29liHnv/H9IcuoLOTwb14rPMHRpRfmgrNVjAG19aFtvLCBgTMRPBWtN2HTGmmm4UQn3HkwtALP77x+MNqugTlablwPHxlXf6F8vLPvY+IT+YRVRNYGPj; AWSALBCORS=WLxAcs6r29liHnv/H9IcuoLOTwb14rPMHRpRfmgrNVjAG19aFtvLCBgTMRPBWtN2HTGmmm4UQn3HkwtALP77x+MNqugTlablwPHxlXf6F8vLPvY+IT+YRVRNYGPj; search=6|1661547971145%7Crect%3D37.857199931363425%252C-122.28295413378906%252C37.693293130474025%252C-122.58370486621094%26rid%3D20330%26disp%3Dmap%26mdm%3Dauto%26p%3D1%26z%3D1%26fs%3D0%26fr%3D0%26mmm%3D0%26rs%3D1%26ah%3D0%26singlestory%3D0%26housing-connector%3D0%26abo%3D0%26garage%3D0%26pool%3D0%26ac%3D0%26waterfront%3D0%26finished%3D0%26unfinished%3D0%26cityview%3D0%26mountainview%3D0%26parkview%3D0%26waterview%3D0%26hoadata%3D1%26zillow-owned%3D0%263dhome%3D0%26featuredMultiFamilyBuilding%3D0%09%0920330%09%09%09%09%09%09; _clsk=1gwrgb2|1658955971691|3|0|d.clarity.ms/collect',
'sec-ch-ua': '".Not/A)Brand";v="99", "Google Chrome";v="103", "Chromium";v="103"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"macOS"',
'sec-fetch-dest': 'document',
'sec-fetch-mode': 'navigate',
'sec-fetch-site': 'none',
'sec-fetch-user': '?1',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36'
}
params = {
'searchQueryState': '{"pagination":{},"usersSearchTerm":"San Francisco, CA","mapBounds":{"west":-122.58370486621094,"east":-122.28295413378906,"south":37.693293130474025,"north":37.857199931363425},"regionSelection":[{"regionId":20330,"regionType":6}],"isMapVisible":false,"filterState":{"sort":{"value":"globalrelevanceex"},"fsba":{"value":false},"fsbo":{"value":false},"nc":{"value":false},"fore":{"value":false},"cmsn":{"value":false},"auc":{"value":false},"rs":{"value":true},"ah":{"value":true}},"isListVisible":true,"mapZoom":12}'
}
url = 'https://www.zillow.com/san-francisco-ca/sold'
houses = requests.get(url, headers=headers, params = params)
houses_cards = BeautifulSoup(houses.text,'lxml')
card = houses_cards.find('ul', {'class':'List-c11n-8-69-2__sc-1smrmqp-0'})
cards=card.find_all('li', {'class': 'ListItem-c11n-8-69-2__sc-10e22w8-0 srp__hpnp3q-0 enEXBq'})
for item in cards:
price= item.select_one('div[class="StyledPropertyCardDataArea-c11n-8-69-2__sc-yipmu-0 kJFQQX"] span')
price=price.get_text(strip=True) if price else None
print(price)
输出:
$1.72M
$1.31M
$2.66M
$1.90M
$2.05M
$4.50M
$515,000
$1.38M
$1.12M
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
来自脚本标记:
import requests
import re
import json
r = requests.get('https://www.zillow.com/san-francisco-ca/sold/1_p/?searchQueryState=%7B%22pagination%22%3A%7B%22currentPage%22%3A2%7D%2C%22usersSearchTerm%22%3A%22San%20Francisco%2C%20CA%22%2C%22mapBounds%22%3A%7B%22west%22%3A-123.05611697558594%2C%22east%22%3A-122.21016970996094%2C%22south%22%3A37.416218151120056%2C%22north%22%3A37.948763955871286%7D%2C%22regionSelection%22%3A%5B%7B%22regionId%22%3A20330%2C%22regionType%22%3A6%7D%5D%2C%22isMapVisible%22%3Afalse%2C%22filterState%22%3A%7B%22sort%22%3A%7B%22value%22%3A%22globalrelevanceex%22%7D%2C%22fsba%22%3A%7B%22value%22%3Afalse%7D%2C%22fsbo%22%3A%7B%22value%22%3Afalse%7D%2C%22nc%22%3A%7B%22value%22%3Afalse%7D%2C%22fore%22%3A%7B%22value%22%3Afalse%7D%2C%22cmsn%22%3A%7B%22value%22%3Afalse%7D%2C%22auc%22%3A%7B%22value%22%3Afalse%7D%2C%22rs%22%3A%7B%22value%22%3Atrue%7D%2C%22ah%22%3A%7B%22value%22%3Atrue%7D%7D%2C%22isListVisible%22%3Atrue%7D',headers = {'User-Agent':'Mozilla/5.0'})
data = json.loads(re.search(r'!--({"queryState".*?)-->', r.text).group(1))
for item in data['cat1']['searchResults']['listResults']:
price= item['soldPrice']
print(price)
输出:
$1.72M
$1.31M
$2.66M
$1.90M
$2.05M
$4.50M
$515,000
$1.38M
$1.12M
$3.42M
$1.66M
$2.42M
$1.40M
$1.50M
$2.12M
$635,000
$1.40M
$1.02M
$9.81M
$1.20M
$509,000
$1.16M
$2.22M
$1.16M
$1.30M
$1.05M
$1.46M
$1.12M
$1.54M
$1.72M
$1.50M
$1.90M
$1.11M
$875,000
$540,000
$1.20M
$2.90M
$2.08M
$1.68M
$2.00M
在您的代码中,cards
是一个bs4.element.ResultSet
对象,它没有与之关联的find_all()
或find()
属性。这些属性只存在于bs4.BeautifulSoup
对象,如houses_cards
,因此会出现错误。
如果要从cards
的值中筛选/搜索,则可以对其进行迭代并相应地选择值。
否则,请尝试以下操作:
houses_cards = BeautifulSoup(houses.text)
cards_ul = houses_cards.find_all('ul', {'class':'List-c11n-8-69-2__sc-1smrmqp-0'})
cards_article = houses_cards.find_all('article', {'class': 'list-card list-card-additional-attribution'})
.find()
/.find_all()
只能在单个项目上工作,但.find_all()
提供了许多项目,这需要使用for
循环来单独在每个项目上运行。
cards = houses_cards.find_all(...)
#all_results = []
for item in cards:
results = item.find_all('article', {'class': 'list-card list-card-additional-attribution'})
#all_results += results