我正在尝试使用Selenium和Python实现2captcha。
我只是从他们的文档中复制了示例:
https://github.com/2captcha/2captcha-api-examples/blob/master/ReCaptcha%20v2%20API%20Examples/Python%20Example/2captcha_python_api_example.py
这是我的代码:
from selenium import webdriver
from time import sleep
from selenium.webdriver.support.select import Select
import requests
driver = webdriver.Chrome('chromedriver.exe')
driver.get('the_url')
current_url = driver.current_url
captcha = driver.find_element_by_id("captcha-box")
captcha2 = captcha.find_element_by_xpath("//div/div/iframe").get_attribute("src")
captcha3 = captcha2.split('=')
#print(captcha3[2])
# Add these values
API_KEY = 'my_api_key' # Your 2captcha API KEY
site_key = captcha3[2] # site-key, read the 2captcha docs on how to get this
url = current_url # example url
proxy = 'Myproxy' # example proxy
proxy = {'http': 'http://' + proxy, 'https': 'https://' + proxy}
s = requests.Session()
# here we post site key to 2captcha to get captcha ID (and we parse it here too)
captcha_id = s.post("http://2captcha.com/in.php?key={}&method=userrecaptcha&googlekey={}&pageurl={}".format(API_KEY, site_key, url), proxies=proxy).text.split('|')[1]
# then we parse gresponse from 2captcha response
recaptcha_answer = s.get("http://2captcha.com/res.php?key={}&action=get&id={}".format(API_KEY, captcha_id), proxies=proxy).text
print("solving ref captcha...")
while 'CAPCHA_NOT_READY' in recaptcha_answer:
sleep(5)
recaptcha_answer = s.get("http://2captcha.com/res.php?key={}&action=get&id={}".format(API_KEY, captcha_id), proxies=proxy).text
recaptcha_answer = recaptcha_answer.split('|')[1]
# we make the payload for the post data here, use something like mitmproxy or fiddler to see what is needed
payload = {
'key': 'value',
'gresponse': recaptcha_answer # This is the response from 2captcha, which is needed for the post request to go through.
}
# then send the post request to the url
response = s.post(url, payload, proxies=proxy)
# And that's all there is to it other than scraping data from the website, which is dynamic for every website.
这是我的错误:
解决 ref 验证码...
回溯(最近一次调用):
File "main.py",第 38 行,在
recaptcha_answer = recaptcha_answer.split('|')[1]
索引错误:列出索引超出范围
验证码正在解决,因为我可以在 2 验证码仪表板上看到它,那么如果是官方文档,哪个错误?
编辑:对于一些没有修改的人,我正在从2验证码中解决验证码,但随后出现此错误:
solving ref captcha...
OK|this_is_the_2captch_answer
Traceback (most recent call last):
File "C:UsersUsuariAppDataLocalProgramsPythonPython37-32libsite-packagesurllib3connectionpool.py", line 594, in urlopen
self._prepare_proxy(conn)
File "C:UsersUsuariAppDataLocalProgramsPythonPython37-32libsite-packagesurllib3connectionpool.py", line 805, in _prepare_proxy
conn.connect()
File "C:UsersUsuariAppDataLocalProgramsPythonPython37-32libsite-packagesurllib3connection.py", line 308, in connect
self._tunnel()
File "C:UsersUsuariAppDataLocalProgramsPythonPython37-32libhttpclient.py", line 906, in _tunnel
(version, code, message) = response._read_status()
File "C:UsersUsuariAppDataLocalProgramsPythonPython37-32libhttpclient.py", line 278, in _read_status
raise BadStatusLine(line)
http.client.BadStatusLine: <html>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:UsersUsuariAppDataLocalProgramsPythonPython37-32libsite-packagesrequestsadapters.py", line 449, in send
timeout=timeout
File "C:UsersUsuariAppDataLocalProgramsPythonPython37-32libsite-packagesurllib3connectionpool.py", line 638, in urlopen
_stacktrace=sys.exc_info()[2])
File "C:UsersUsuariAppDataLocalProgramsPythonPython37-32libsite-packagesurllib3utilretry.py", line 368, in increment
raise six.reraise(type(error), error, _stacktrace)
File "C:UsersUsuariAppDataLocalProgramsPythonPython37-32libsite-packagesurllib3packagessix.py", line 685, in reraise
raise value.with_traceback(tb)
File "C:UsersUsuariAppDataLocalProgramsPythonPython37-32libsite-packagesurllib3connectionpool.py", line 594, in urlopen
self._prepare_proxy(conn)
File "C:UsersUsuariAppDataLocalProgramsPythonPython37-32libsite-packagesurllib3connectionpool.py", line 805, in _prepare_proxy
conn.connect()
File "C:UsersUsuariAppDataLocalProgramsPythonPython37-32libsite-packagesurllib3connection.py", line 308, in connect
self._tunnel()
File "C:UsersUsuariAppDataLocalProgramsPythonPython37-32libhttpclient.py", line 906, in _tunnel
(version, code, message) = response._read_status()
File "C:UsersUsuariAppDataLocalProgramsPythonPython37-32libhttpclient.py", line 278, in _read_status
raise BadStatusLine(line)
urllib3.exceptions.ProtocolError: ('Connection aborted.', BadStatusLine('<html>rn'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "main.py", line 49, in <module>
response = s.post(url, payload, proxies=proxy)
File "C:UsersUsuariAppDataLocalProgramsPythonPython37-32libsite-packagesrequestssessions.py", line 581, in post
return self.request('POST', url, data=data, json=json, **kwargs)
File "C:UsersUsuariAppDataLocalProgramsPythonPython37-32libsite-packagesrequestssessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "C:UsersUsuariAppDataLocalProgramsPythonPython37-32libsite-packagesrequestssessions.py", line 646, in send
r = adapter.send(request, **kwargs)
File "C:UsersUsuariAppDataLocalProgramsPythonPython37-32libsite-packagesrequestsadapters.py", line 498, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', BadStatusLine('<html>rn'))
为什么我会收到此错误?
我设置为site_key = current_url_where_captcha_is_located
这是对的吗?
在尝试调用recaptcha_answer
之前,使用调试器或在错误行前放置一个print(recaptcha_answer)
以查看.split('|')
的值。字符串中没有|
,因此当您尝试获取结果列表的第二个元素时,它会失败[1]
。
看起来您没有提供任何有效的代理连接参数,但在连接到 API 时将此代理传递给requests
。
只需注释这两行:
#proxy = 'Myproxy' # example proxy
#proxy = {'http': 'http://' + proxy, 'https': 'https://' + proxy}
然后从四行中删除proxies=proxy
:
captcha_id = s.post("http://2captcha.com/in.php?key={}&method=userrecaptcha&googlekey={}&pageurl={}".format(API_KEY, site_key, url)).text.split('|')[1]
recaptcha_answer = s.get("http://2captcha.com/res.php?key={}&action=get&id={}".format(API_KEY, captcha_id)).text
recaptcha_answer = s.get("http://2captcha.com/res.php?key={}&action=get&id={}".format(API_KEY, captcha_id)).text
response = s.post(url, payload, proxies=proxy)