请求和 urllib2 从 XBRL 页面获取错误。 'The browser mode you are running is not compatible with this applicatio



不知道为什么我无法从此链接获取页面。我只想得到它,然后喂成美丽的汤。

import requests,urllib2
link='https://www.sec.gov/ix?doc=/Archives/edgar/data/1373715/000137371518000157/now-2018630x10q.htm'
r = requests.get(link)
r2=urllib2.urlopen(link)
html=r2.read()

还尝试使用以下命令伪造浏览器:

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
r = requests.get(link, headers=headers)

文本是相同的...不是我想要的页面。

获取如下所示的标头

var note = 'The browser mode you are running is not compatible with this application.';
browserName ='Microsoft Internet Explorer';
note +='You are currently running '+browserName+' '+((ie7>0)?7:8)+'.0.';       
var userAgent = window.navigator.userAgent.toLowerCase();           
if(userAgent.indexOf('ipad') != -1 || userAgent.indexOf('iphone') != -1 || userAgent.indexOf('apple') != -1){               
note += ' Please use a more current version of '+browserName+' in order to use the application.';
}else if(userAgent.indexOf('android') != -1){               
note += ' Please use a more current version of Google Chrome or Mozilla Firefox in order to use the application.';
}else{              
note += ' Please use a more current version of Microsoft Internet Explorer, Google Chrome or Mozilla Firefox in order to use the application.';
}

我可以很好地获得此页面:https://www.sec.gov/Archives/edgar/data/1373715/000137371518000153/erq2fy18-document.htm

这不是 XBRL 文档。我认为这与XBRL有关,服务器希望我的浏览器与数据进行交互?

似乎页面的这一部分是由js呈现的。通常动态内容最可靠的选择是selenium,但在这种情况下,您可以避免它并使用requests.

很明显,该页面使用此文档的内容/Archives/edgar/data/1373715/000137371518000157/now-2018630x10q.htm。您可以绕过该页面并直接请求文档。

import requests
url = "https://www.sec.gov/Archives/edgar/data/1373715/000137371518000157/now-2018630x10q.htm"
r = requests.get(url)
html = r.text
print(html)

相关内容

最新更新