当我试图用selenium和python来抓取这个网站时



当我尝试抓取网站时,它只会抛出一些错误我认为这可能与我的网络驱动程序有关,但idk

我正试图获得这些数据,这样我就可以把它放在电子表格中,以获得一些很酷的静态

from selenium import webdriver
url = 'https://prosettings.net/cs-go-pro-settings-gear-list/'
driver = webdriver.Chrome(executable_path="C:WebDriverschromedriver.exe")
driver.get(url)
names = driver.find_elements_by_class_name("  column-player")
for name in names:
title = name.find_element_by_xpath('.//a').text
print(name)

这是我在终端中得到的错误

d:downloadsPythonScrapingTest.py:5: DeprecationWarning: executable_path has been deprecated, please pass in a Service object
driver = webdriver.Chrome(executable_path="C:WebDriverschromedriver.exe")
DevTools listening on ws://127.0.0.1:53131/devtools/browser/73ca0453-352e-47a0-a98a-fb539150d6f9
d:downloadsPythonScrapingTest.py:8: DeprecationWarning: find_elements_by_* commands are deprecated. Please use find_elements() instead
names = driver.find_elements_by_class_name("  column-player")
Traceback (most recent call last):
File "d:downloadsPythonScrapingTest.py", line 8, in <module>
names = driver.find_elements_by_class_name("  column-player")
File "C:Usersterk0AppDataLocalPackagesPythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0LocalCachelocal-packagesPython310site-packagesseleniumwebdriverremotewebdriver.py", line 783, in 
find_elements_by_class_name
return self.find_elements(by=By.CLASS_NAME, value=name)
File "C:Usersterk0AppDataLocalPackagesPythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0LocalCachelocal-packagesPython310site-packagesseleniumwebdriverremotewebdriver.py", line 1279, in find_elements
return self.execute(Command.FIND_ELEMENTS, {
File "C:Usersterk0AppDataLocalPackagesPythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0LocalCachelocal-packagesPython310site-packagesseleniumwebdriverremotewebdriver.py", line 424, in 
execute
self.error_handler.check_response(response)
File "C:Usersterk0AppDataLocalPackagesPythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0LocalCachelocal-packagesPython310site-packagesseleniumwebdriverremoteerrorhandler.py", line 247, 
in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.InvalidSelectorException: Message: invalid selector: An invalid or illegal selector was specified
(Session info: chrome=98.0.4758.102)
Stacktrace:
Backtrace:
Ordinal0 [0x00EF69A3+2582947]
Ordinal0 [0x00E8A6D1+2139857]
Ordinal0 [0x00D83A98+1063576]
Ordinal0 [0x00D862B7+1073847]
Ordinal0 [0x00D8617E+1073534]
Ordinal0 [0x00D863F0+1074160]
Ordinal0 [0x00DAFCB2+1244338]
Ordinal0 [0x00DB013B+1245499]
Ordinal0 [0x00DD9F8C+1417100]
Ordinal0 [0x00DC8594+1344916]
Ordinal0 [0x00DD834A+1409866]
Ordinal0 [0x00DC8366+1344358]
Ordinal0 [0x00DA5176+1200502]
Ordinal0 [0x00DA6066+1204326]
GetHandleVerifier [0x0109BE02+1675858]
GetHandleVerifier [0x0115036C+2414524]
GetHandleVerifier [0x00F8BB01+560977]
GetHandleVerifier [0x00F8A8D3+556323]
Ordinal0 [0x00E9020E+2163214]
Ordinal0 [0x00E95078+2183288]
Ordinal0 [0x00E951C0+2183616]
Ordinal0 [0x00E9EE1C+2223644]
BaseThreadInitThunk [0x7586FA29+25]
RtlGetAppContainerNamedObjectPath [0x77957A9E+286]
RtlGetAppContainerNamedObjectPath [0x77957A6E+238]

这里有两个问题:

  1. 而不是
names = driver.find_elements_by_class_name("  column-player")

应该是

names = driver.find_elements_by_class_name("column-player")

(我知道,column-player类名之前有空格,但您仍然不应该将它们放在定位器中(
2(您应该添加一个延迟,以便只有在页面完全加载后才能访问这些元素
这应该更有效:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
url = 'https://prosettings.net/cs-go-pro-settings-gear-list/'
driver = webdriver.Chrome(executable_path="C:WebDriverschromedriver.exe")
wait = WebDriverWait(driver, 20)
driver.get(url)
#wait for at least 1 element visibility
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".column-player")))
#short additional wait for all the other elements to complete loading
time.sleep(0.5)
names = driver.find_elements_by_class_name("column-player")
for name in names:
title = name.find_element_by_xpath('.//a').text
print(name)

相关内容

  • 没有找到相关文章

最新更新