使用 python 打印类 div 进行网页抓取



我想将它们全部打印为顶部div内给定站点的div类。这将是我有兴趣打印的网站html的一部分

<div class="game">
<div class="history-feed__collection">
<div class="history-feed__card h-card h-card_sm h-card_spades" style="width: 41px; margin-right: 18px; opacity: 1;">
<div class="h-card__sign">9</div></div>
<div class="history-feed__card h-card h-card_sm h-card_hearts" style="width: 41px; margin-right: 18px; opacity: 1;">
<div class="h-card__sign">K</div></div>
<div class="history-feed__card h-card h-card_sm h-card_diamonds" style="width: 41px; margin-right: 18px; opacity: 1;">
<div class="h-card__sign">Q</div></div>
<div class="history-feed__card h-card h-card_sm h-card_clubs" style="width: 41px; margin-right: 18px; opacity: 1;">
<div class="h-card__sign">2</div>
</div></div>

Eu gostaria que o programa imprimisse assim: "历史-feed__card H卡H-card_sm H-card_spades, 历史feed__card H卡H card_sm H card_hearts, ...">

我开始了这段代码,但我仍然发现问题, 因为代码只打印 Div 中包含的内容,而不是其类的名称

from selenium import webdriver
driver = webdriver.Chrome(executable_path='C:chromedriver')
driver.get('https://card.com')
id = driver.find_elements_by_xpath('//*[@class]')
for ii in id:
print(ii.get_attribute('class="hilo-history-feed__collection"'))

driver.close()

尝尝美汤:

import requests
from bs4 import BeautifulSoup
URL = 'http://www.card.com'
response = requests.get(URL)
soup = BeautifulSoup(response.content, 'html5lib')
divs = soup.find_all('div')
classes = [div.get('class') for div in divs]
print(classes)

我设法用这段代码取得了成功


import requests
from bs4 import BeautifulSoup
URL = 'http://www.card.com'
response = requests.get(URL)
soup = BeautifulSoup(response.content, 'html5lib')
for i in soup.find_all('div'): 
print(i)

感谢所有帮助过的人

最新更新