来自彭博社的股指数据,使用请求和bs4



我正在尝试从这个网站'https://www.bloomberg.com/markets/stocks'抓取股票指数数据并将值保存在.CSV 文件。

这是我到目前为止的代码:

from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen('https://www.bloomberg.com/markets/stocks')
bs = BeautifulSoup(html,'html.parser')
for siblings in bs.find('tbody',{'class':'data-table-body'}).tr.next_siblings:
    print(siblings)

我从这段代码中获取了我需要的数据,但我想清理 HTML 以仅显示索引的名称和相关值。CSV 文件中的标头应为:

名称 价值 净变化 百分比变化 1 个月 1 年时间 (EDT(

提前感谢您的支持

对于抓取,我建议您查看requests-html库(仅支持Python 3.6(,因为BeautifulSoup API可能有点麻烦和不直观。Requests-HTML在引擎盖下使用BeautifulSoup,但提供了许多方便的方法,可以简化你的代码。以下是使用 requests-html 实现任务的过程:

from requests_html import HTMLSession
HEADERS = ("Name", "Value", "Net Change", "% Change" "1 Month", "1 Year", "Time (EDT)")
session = HTMLSession()
response = session.get('https://www.bloomberg.com/markets/stocks')
tables = response.html.find('tbody.data-table-body')
rows = []
for table in tables:
    for tr in table.find('tr'):
        row = []
        for header, td in zip(HEADERS, tr.find('td')):
            content = td.full_text.strip()
            row.append((header, content))
        rows.append(row)
for row in rows:
    print(row)

代码:

import csv
from bs4 import BeautifulSoup
import requests
url = 'https://www.bloomberg.com/markets/stocks'
headers = ('Name', 'Value', 'Net Change', '% Change' '1 Month', '1 Year', 'Time (EDT)')
r = requests.get(url)
soup = BeautifulSoup(r.text, 'lxml')
trs = soup.select('.data-table-body > tr')

with open('data.csv', 'w') as outcsv:
    writer = csv.writer(outcsv)
    writer.writerow(headers)
    for tr in trs:
        tds = tr.find_all('td')[:7]
        tds[0].select_one('[data-type="abbreviation"]').decompose()  # optional
        content = [td.text.strip() for td in tds]
        writer.writerow(content)

如果要同时存储缩写和全名,例如:INDU:IND DOW JONES INDUS. AVG删除标记为可选的行。

您可以尝试像下面这样获取数据并相应地编写数据。

import csv
from urllib.request import urlopen
from bs4 import BeautifulSoup
Headers = ["Name", "Value", "Net Change", "% Change", "1 Month", "1 Year", "Time (EDT)"]
res = urlopen('https://www.bloomberg.com/markets/stocks')
soup = BeautifulSoup(res.read(),'html.parser')
with open('bloomberg.csv','w', newline='') as infile:
    writer = csv.writer(infile)
    writer.writerow(Headers)
    for tr in soup.select(".data-table-body tr"):
        data = [item.get_text(strip=True) for item in tr.find_all("td")]
        writer.writerow(data)

最新更新