如何参考第二个网站进行web抓取数据(div)

我正试图用漂亮的汤和硒从页面中抓取数据。每当我检查页面以确定我想要检索的元素时，我发现它们没有问题，请参阅此处查看我想要的元素，因为它们出现在html代码中：

<div class="full-date">
<span class="day-of-week ng-binding">Thursday</span> | 
<span class="month ng-binding">August</span>
<span class="date ng-binding">04, </span>
<span class="year ng-binding">2022</span>
</div>

我需要月份、日期和年份。正如你所看到的，元素在一个树(div(中，所以我使用这行代码来获取web中的所有div：

divs = soup.find_all('div')

我打印div，所有的信息似乎都在那里，然而，当我在"；span"；我得到这个：

<div class="full-date">
<span class="day-of-week">{{ WMService.auctionStartDate(lot.auction) | moment:'dddd' }}</span> | 
<span class="month">{{ WMService.auctionStartDate(lot.auction) | moment:'MMMM' }}</span>
<span class="date">{{ WMService.auctionStartDate(lot.auction) | moment:'DD' }}, </span>
<span class="year">{{ WMService.auctionStartDate(lot.auction) | moment:'YYYY' }}</span>
</div>

我没有得到实际的数据，例如"；八月"04〃；或"；2022"；。这就像html代码引用了另一个代码或页面来获取这些信息一样；WMService.auditionStartDate(地段拍卖("；

这可能吗？我该如何克服这一点？

我非常感谢您对的投入

谢谢！

这些元素是在页面中动态加载的，所以您需要等待它们出来，然后才能找到它们。使用纯硒获取数据的一种方法是：

month = WebDriverWait(browser, 20).until(EC.element_to_be_clickable((By.XPATH, "//span[@class='month ng-binding']")))
day = WebDriverWait(browser, 20).until(EC.element_to_be_clickable((By.XPATH, "//span[@class='date ng-binding']")))
year = WebDriverWait(browser, 20).until(EC.element_to_be_clickable((By.XPATH, "//span[@class='year ng-binding']")))

您还需要以下导入：

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

相关内容

最新更新

热门标签：