>我在使用PhantomJS时遇到了问题,它可以挂在循环中而不会报告任何错误。我知道我的代码很好,因为重新启动后它通常会完成,并且以后可能会挂起。我想到的可能是这样的:
i = 0
while i < len(url_list):
try:
driver.get(url_list[i])
# do whatever needs to be done
i = i+1
# go on the next one
except ThisIterationTakesTooLong:
# try again for this one because the code is definitely good
continue
甚至有可能做这样的事情吗?基本上,它是后台检查循环运行多长时间的东西。我知道 time.time((,但问题是它甚至无法测量它是否挂在计数器之前的命令上。
编辑
在查看建议的问题后,我仍然遇到问题,因为该信号模块无法正常工作。
import signal
signal.alarm(5)
这将引发"属性错误:'模块'对象没有属性'警报'"
所以看起来我真的不能使用它。
我以前遇到过这种事情,不幸的是,没有很好的解决方法。事实是,有时页面/元素无法加载,您必须对此做出选择。我通常最终会做这样的事情:
from selenium.common.exceptions import TimeoutException
# How long to wait for page before timeout
driver.set_page_load_timeout(10)
def wait_for_url(driver, url, max_attempts):
"""Make multiple attempts to load page
according to page load timeout, and
max_attempts."""
attempts = 0
while attempts < max_attempts:
try:
driver.get(url)
return True
except TimeoutException:
# Prepare for another attempt
attempts += 1
if attempts == 10:
# Bail on max_attempts
return False
# We'll use this if we find any urls that won't load
# so we can process later.
revisit = []
for url in url_list:
# Make 10 attempts before giving up.
url_is_loaded = wait_for_url(driver, url, 10)
if url_is_loaded:
# Do whatever
else:
revisit.append(url)
# Now we can try to process those unvisitied URLs.
我还要补充一点,问题可能出在PhantomJS上。最新版本的硒弃用了它。根据我的经验,PhantomJS很迟钝,容易出现意外行为。如果您需要无头,则可以使用非常稳定的Chrome。如果您不熟悉,则如下所示:
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(path/to/chromedriver, chrome_options=chrome_options)
也许其中一个建议会有所帮助。