Webscraping Python 试图拉动变化"id"



以下是我的代码。

import bs4
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = 'https://steamcommunity.com/market/listings/730/Souvenir%20P2000%20%7C%20Chainmail%20%28Factory%20New%29'
# open and read
uClient = uReq(my_url)
page_html = uClient.read()
#close
uClient.close()
#html parse
page_soup = soup(page_html,"html.parser")
#grab all listings
containers = page_soup.findAll("div",{"class":"market_listing_item_name_block"})
for container in containers:
block_container = container.findAll("span",{"class":"market_listing_item_name"})

block_container返回多个结果都相同,除了它们在<span>id = "listing_#_name"其中 # 是随每个<span>而变化的数字组合

例如-

</br></div>, <div class="market_listing_item_name_block">
<span class="market_listing_item_name" id="listing_2060891817875196312_name" style="color: #FFD700;">Souvenir P2000 | Chainmail (Factory New)</span>
<br/>
<span class="market_listing_game_name">Counter-Strike: Global Offensive</span>
</div>, <div class="market_listing_item_name_block">
<span class="market_listing_item_name" id="listing_2076653149485426829_name" style="color: #FFD700;">Souvenir P2000 | Chainmail (Factory New)</span>
<br/>

谁能解释一下我如何从所有跨度中抓取id

您可以从span标签中获取id

尝试:

for container in containers:
for block_container in container.findAll("span", class_="market_listing_item_name"):
print(block_container.attrs['id'])

来自美丽汤文档

一个标签可以具有任意数量的属性。标签具有 值为boldest的属性id。您可以访问标签的 属性,将标签视为字典:

tag['id']
# u'boldest'

您可以按.attrs直接访问该词典:

tag.attrs
# {u'id': 'boldest'}

参考资料:

  • 美丽汤文档 - 属性

这能帮助你吗

from simplified_scrapy.simplified_doc import SimplifiedDoc 
html = '''</br></div>, <div class="market_listing_item_name_block">
<span class="market_listing_item_name" id="listing_2060891817875196312_name" style="color: #FFD700;">Souvenir P2000 | Chainmail (Factory New)</span>
<br/>
<span class="market_listing_game_name">Counter-Strike: Global Offensive</span>
</div>, <div class="market_listing_item_name_block">
<span class="market_listing_item_name" id="listing_2076653149485426829_name" style="color: #FFD700;">Souvenir P2000 | Chainmail (Factory New)</span>
<br/></div>'''
doc = SimplifiedDoc(html)
containers = doc.getElements(tag='div',value='market_listing_item_name_block')
for container in containers:
block_container = container.span 
# or
# block_container = container.getElement(tag='span',value='market_listing_item_name')
# block_containers = container.getElements(tag='span',value='market_listing_item_name').contains('listing_',attr='id')
print (block_container.id, doc.replaceReg(block_container.id,"(listing_|_name)",""))

最新更新