使用Selenium与Python和PhantomJS一起将文件下载到文件系统

我一直在努力使用PhantomJS/Senium/python-Selenium将文件下载到文件系统。我可以很容易地浏览DOM并点击、悬停等。然而，下载文件却很麻烦。我尝试过使用Firefox和pyvirtualdisplay的无头方法，但效果也不好，而且速度慢得令人难以置信。我知道CasperJS允许文件下载。有人知道如何将CasperJS与Python集成，或者如何利用PhantomJS下载文件吗。非常感谢。

尽管这个问题由来已久，但通过PhantomJS下载文件仍然是个问题。但我们可以使用PhantomJS获取下载链接，并获取所有需要的cookie，如csrf令牌等。然后我们可以使用requests实际下载：

import requests
from selenium import webdriver
driver = webdriver.PhantomJS()
driver.get('page_with_download_link')
download_link = driver.find_element_by_id('download_link')
session = requests.Session()
cookies = driver.get_cookies()
for cookie in cookies: 
    session.cookies.set(cookie['name'], cookie['value'])
response = session.get(download_link)

现在，在response.content中应该会出现实际的文件内容。接下来我们可以用open来写，也可以随心所欲。

PhantomJS目前不支持文件下载。解决方法的相关问题：

文件下载
如何使用Selenium网络驱动程序和PhantomJS处理文件保存对话框

据我所知，你至少有三种选择：

切换到casperjs（您应该将python留在此处）
在xvfb上尝试无头
切换到正常的非无头浏览器

这里还有一些链接可能也会有所帮助：

Ubuntu中的Selenium无头自动测试
XWindows for Headless Selenium（内部有更多链接）
如何在无头模式下运行浏览器（chrome、IE和firefox）
教程：如何在Linux中使用无头Firefox进行刮擦

我的用例需要提交表单来检索文件。我能够使用驱动程序的execute_async_script()功能来实现这一点。

 js = '''
    var callback = arguments[0];
    var theForm = document.forms['theFormId'];
    data = new FormData();
    data.append('eventTarget', "''' + target + '''"); // this is the id of the file clicked
    data.append('otherFormField', theForm.otherFormField.value);
    var xhr = new XMLHttpRequest();
    xhr.open('POST', theForm.action, true);
'''
for cookie in driver.get_cookies():
    js += ' xhr.setRequestHeader("' + cookie['name'] + '", "' + cookie['value'] + '"); '
js += '''
    xhr.onload = function () {
        callback(this.responseText);
    };
    xhr.send(data);
'''
driver.set_script_timeout(30)
file = driver.execute_async_script(js)

这样做是不可能的。您可以使用其他选项来下载像wget o curl这样的文件。

使用firefox查找正确的请求，使用selenium获取相应的值，最后使用开箱即用下载文件

curlCall=" curl 'http://www_sitex_org/descarga.jsf' -H '...allCurlRequest....' > file.xml"
subprocess.call(curlCall, shell=True)

相关内容

最新更新

热门标签：