使用 splinter 和 Python,我有两个线程正在运行,每个线程访问相同的主 URL 但路由不同,例如线程一命中:mainurl.com/threadone
和线程二命中:mainurl.com/threadtwo
使用:
from splinter import Browser
browser = Browser('chrome')
但是遇到了以下错误:
Traceback (most recent call last):
File "multi_thread_practice.py", line 299, in <module>
main()
File "multi_thread_practice.py", line 290, in main
first_method(r)
File "multi_thread_practice.py", line 195, in parser
second_method(title, name)
File "multi_thread_practice.py", line 208, in confirm_product
third_method(current_url)
File "multi_thread_practice.py", line 214, in buy_product
browser.visit(url)
File "/Users/joshua/anaconda/lib/python2.7/site-packages/splinter/driver/webdriver/__init__.py", line 184, in visit
self.driver.get(url)
File "/Users/joshua/anaconda/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 261, in get
self.execute(Command.GET, {'url': url})
File "/Users/joshua/anaconda/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 247, in execute
response = self.command_executor.execute(driver_command, params)
File "/Users/joshua/anaconda/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 464, in execute
return self._request(command_info[0], url, body=data)
File "/Users/joshua/anaconda/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 488, in _request
resp = self._conn.getresponse()
File "/Users/joshua/anaconda/lib/python2.7/httplib.py", line 1108, in getresponse
raise ResponseNotReady()
httplib.ResponseNotReady
错误是什么,我应该如何处理问题?
提前感谢您,一定会投票/接受答案
已添加代码
import time
from splinter import Browser
import threading
browser = Browser('chrome')
start_time = time.time()
urlOne = 'http://www.practiceurl.com/one'
urlTwo = 'http://www.practiceurl.com/two'
baseUrl = 'http://practiceurl.com'
browser.visit(baseUrl)
def secondThread(url):
print 'STARTING 2ND REQUEST: ' + str(time.time() - start_time)
browser.visit(url)
print 'END 2ND REQUEST: ' + str(time.time() - start_time)
def mainThread(url):
print 'STARTING 1ST REQUEST: ' + str(time.time() - start_time)
browser.visit(url)
print 'END 1ST REQUEST: ' + str(time.time() - start_time)
def main():
threadObj = threading.Thread(target=secondThread, args=[urlTwo])
threadObj.daemon = True
threadObj.start()
mainThread(urlOne)
main()
据我所知,您尝试做的事情在一个浏览器上是不可能的。Splinter 是在实际的浏览器上运行的,因此,同时传入许多命令会导致问题。它就像人类与浏览器交互一样(当然是自动化的)。可以打开许多浏览器窗口,但是如果不收到来自上一个请求的响应,则无法在不同的线程中发送请求。这会导致无法发送请求错误。因此,我建议(如果您需要使用线程)打开两个浏览器,然后使用线程通过每个浏览器发送请求。否则,它无法完成。
这个线程在硒上,但信息是可传输的。硒一次多个标签 同样,这说明你想要(我假设)做的事情是不可能的。绿色打勾的答案给出者提出了与我相同的建议。
希望这不会让你偏离轨道太多,并帮助你。
编辑:只是为了显示:
import time
from splinter import Browser
import threading
browser = Browser('firefox')
browser2 = Browser('firefox')
start_time = time.time()
urlOne = 'http://www.practiceurl.com/one'
urlTwo = 'http://www.practiceurl.com/two'
baseUrl = 'http://practiceurl.com'
browser.visit(baseUrl)
def secondThread(url):
print 'STARTING 2ND REQUEST: ' + str(time.time() - start_time)
browser2.visit(url)
print 'END 2ND REQUEST: ' + str(time.time() - start_time)
def mainThread(url):
print 'STARTING 1ST REQUEST: ' + str(time.time() - start_time)
browser.visit(url)
print 'END 1ST REQUEST: ' + str(time.time() - start_time)
def main():
threadObj = threading.Thread(target=secondThread, args=[urlTwo])
threadObj.daemon = True
threadObj.start()
mainThread(urlOne)
main()
请注意,我使用了火狐浏览器,因为我没有安装chromedriver。
在浏览器打开后设置等待可能是个好主意,以确保它们在计时器开始之前完全准备就绪。
@GenericSnake在这个问题上是正确的。为了补充一点,我强烈建议你重构代码以使用多处理库,主要是因为线程实现使用 GIL:
在CPython中,由于全局解释器锁,只有一个线程可以 立即执行 Python 代码(即使某些面向性能 库可能会克服此限制)。如果你想要你的 更好地利用计算资源的应用程序 多核机器,建议使用多处理。然而 如果要运行多个线程,线程仍然是一个合适的模型 同时执行 I/O 绑定任务。
实际上,使用多处理的一件好事是,您可以重构代码以避免重复的方法secondThread
和mainThread
,例如这种方式(最后一件事,不要忘记清理您使用的资源,例如完成后browser.quit()
关闭浏览器):
import time
from splinter import Browser
from multiprocessing import Process
import os
os.environ['PATH'] = os.environ[
'PATH'] + "path/to/geckodriver" + "path/to/firefox/binary"
start_time = time.time()
urlOne = 'http://pythoncarsecurity.com/Support/FAQ.aspx'
urlTwo = 'http://pythoncarsecurity.com/Products/'
def url_visitor(url):
print("url called: " + url)
browser = Browser('firefox')
print('STARTING REQUEST TO: ' + url + " at "+ str(time.time() - start_time))
browser.visit(url)
print('END REQUEST TO: ' + url + " at "+ str(time.time() - start_time))
def main():
p1 = Process(target=url_visitor, args=[urlTwo])
p2 = Process(target=url_visitor, args=[urlOne])
p1.start()
p2.start()
p1.join() #join processes to the main process to see the output
p2.join()
if __name__=="__main__":
main()
这给了我们以下输出(时间取决于您的系统):
url called: http://pythoncarsecurity.com/Support/FAQ.aspx
url called: http://pythoncarsecurity.com/Products/
STARTING REQUEST TO: http://pythoncarsecurity.com/Support/FAQ.aspx at 10.763000011444092
STARTING REQUEST TO: http://pythoncarsecurity.com/Products/ at 11.764999866485596
END REQUEST TO: http://pythoncarsecurity.com/Support/FAQ.aspx at 16.20199990272522
END REQUEST TO: http://pythoncarsecurity.com/Products/ at 16.625999927520752
编辑:多线程和Selenium的问题在于浏览器实例不是线程安全的,我发现解决此问题的唯一方法是在url_visitor
上锁定,但是,在这种情况下,您将失去多线程的优势。这就是为什么我相信使用多个浏览器更有益(尽管我猜你有一些非常具体的要求),请参阅下面的代码:
import time
from splinter import Browser
import threading
from threading import Lock
import os
os.environ['PATH'] = os.environ[
'PATH'] + "/path/to/chromedriver"
start_time = time.time()
urlOne = 'http://pythoncarsecurity.com/Support/FAQ.aspx'
urlTwo = 'http://pythoncarsecurity.com/Products/'
browser = Browser('chrome')
lock = threading.Lock()#create a lock for the url_visitor method
def init():
browser.visit("https://www.google.fr")
driver = browser.driver
driver.execute_script("window.open('{0}', '_blank');") #create a new tab
tabs = driver.window_handles
def url_visitor(url, tabs):
with lock:
if tabs != 0:
browser.driver.switch_to_window(browser.driver.window_handles[tabs])
print("url called: " + url)
print('STARTING REQUEST TO: ' + url + " at "+ str(time.time() - start_time))
browser.visit(url)
print('END REQUEST TO: ' + url + " at "+ str(time.time() - start_time))
browser.quit()
def main():
p1 = threading.Thread(target=url_visitor, args=[urlTwo, 0])
p2 = threading.Thread(target=url_visitor, args=[urlOne, 1])
p1.start()
p2.start()
if __name__=="__main__":
init() #create a browser with two tabs
main()