用Beautiful Soup来抓取tiingo HTML

我想从标普500指数中各公司的各自网页上获取财务数据

例如，取以下URL:

https://www.tiingo.com/f/b/aapl

显示Apple最近的资产负债表数据

我正在寻找提取"属性，植物&最近一个季度的"设备"金额，在这个特殊情况下是25.45亿美元。但是，我在编写正确的Beautiful Soup代码以提取此文本时遇到了麻烦。

检查元素，我看到25.45B号位于一个元素的"ng-binding ng-scope"类和"coli -xs-6 coli -sm-3 coli -md-3 coli -lg-3 statement-field-data ng-scope"类中，而这个类本身位于"coli -xs-7 coli -sm-8 coli -md-8 coli -lg-9 no-padding-left no-padding-right"类中。

但是，我不确定如何准确地编写Beautiful Soup代码来定位正确的元素，然后执行element. gettext()函数。

我在想这样的事情:

import os, bs4, requests
res_bal = requests.get("https://www.tiingo.com/f/b/aapl")
res_bal.raise_for_status()
soup_bal = bs4.BeautifulSoup(res_bal.text, "html.parser")
elems_bal = soup_bal.select(".col-xs-6 col-sm-3 col-md-3 col-lg-3 statement-field-data ng-scope")
elems_bal_2 = elems_bal.select(".ng-binding ng-scope")
joe = elems_bal_2.getText()
print(joe)

，但到目前为止，我还没有成功使用这段代码。任何帮助将非常感激!

选择器的问题

elems_bal = soup_bal.select(".col-xs-6 col-sm-3 col-md-3 col-lg-3 statement-field-data ng-scope")
elems_bal_2 = elems_bal.select(".ng-binding ng-scope")

是，页面中有多个元素具有相同的类，因此您没有得到正确的结果。

注意，如果你只使用beautifulsoup和请求，那么页面源中的内容没有你想要抓取的数据，这是可以做到的在selenium和beautifulsoup的帮助下，您可以这样做:如果您没有首先安装selenium，请执行:pip install selenium

下面是相同的工作代码:

from selenium import webdriver
import  bs4, time
driver = webdriver.Firefox()   
driver.get("https://www.tiingo.com/f/b/aapl")
driver.maximize_window()
# sleep is given so that JS populate data in this time
time.sleep(10)
pSource= driver.page_source
soup = bs4.BeautifulSoup(pSource, "html.parser")
Property=soup.findAll('div',{'class':'col-xs-5 col-sm-4 col-md-4 col-lg-3 statement-field-name indent-2'})
for P in Property:
    if 'Property' in P.text.strip():
        print P.text
x=soup.find("a",{"ng-click":"toggleFundData('Property, Plant & Equipment',SDCol.restatedString==='restated',true)"})
print x.text

同样的输出是:

Property, Plant & Equipment
25.45B

相关内容

最新更新

热门标签：