如何在熊猫身上使用硒来阅读网页

我想使用chromedriver收集网页的信息。如何安装和使用它？

如果您还没有selenium，则必须先安装它。然后使用硒：

from selenium.webdriver import Chrome
url="URL of the webpage you want to read"

设置驱动程序

webdriver = "path of the chromedriver.exe file saved in your pc"
driver.get(url)

使用css选择器

y = driver.find_element_by_css_selector('css selector of the data you want to read from the webpage').text
print(y)

您不安装chromedriver-您下载.exe(从这里(并在webdriver.Chrome()中使用它的路径。这个入门页面有一个全面的指南：

from selenium import webdriver
driver = webdriver.Chrome('/path/to/chromedriver')  # refers to the path where you saved the exe
driver.get('http://www.google.com/');
time.sleep(5) # Let the user actually see something!
search_box = driver.find_element_by_name('q')
search_box.send_keys('ChromeDriver')
search_box.submit()
time.sleep(5) # Let the user actually see something!
driver.quit()

注意：下载与您的chrome版本匹配的.exe！(在Help > About Google Chrome中(

正如@Patha_Mondal所提到的，您需要下载驱动程序并选择要读取的元素。然而，正如你最初的问题所问的"如何在熊猫身上使用硒来阅读网页？"，我想说的是，应该考虑使用Scrapy和selenium从网页数据创建一个".csv"文件。

使用pandas.Read_csv((.将".csv"数据读取到pandas中

网页中的数据可能不干净或格式不正确。使用Scrapy创建一个数据集将有助于将其解读为熊猫。避免在与Selenium和Scrapy相同的脚本中直接使用panda。

希望能有所帮助。

相关内容

最新更新

热门标签：