我正在尝试从这个网站'https://www.bloomberg.com/markets/stocks'抓取股票指数数据并将值保存在.CSV 文件。
这是我到目前为止的代码:
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen('https://www.bloomberg.com/markets/stocks')
bs = BeautifulSoup(html,'html.parser')
for siblings in bs.find('tbody',{'class':'data-table-body'}).tr.next_siblings:
print(siblings)
我从这段代码中获取了我需要的数据,但我想清理 HTML 以仅显示索引的名称和相关值。CSV 文件中的标头应为:
名称 价值 净变化 百分比变化 1 个月 1 年时间 (EDT(
提前感谢您的支持
对于抓取,我建议您查看requests-html库(仅支持Python 3.6(,因为BeautifulSoup API可能有点麻烦和不直观。Requests-HTML在引擎盖下使用BeautifulSoup,但提供了许多方便的方法,可以简化你的代码。以下是使用 requests-html 实现任务的过程:
from requests_html import HTMLSession
HEADERS = ("Name", "Value", "Net Change", "% Change" "1 Month", "1 Year", "Time (EDT)")
session = HTMLSession()
response = session.get('https://www.bloomberg.com/markets/stocks')
tables = response.html.find('tbody.data-table-body')
rows = []
for table in tables:
for tr in table.find('tr'):
row = []
for header, td in zip(HEADERS, tr.find('td')):
content = td.full_text.strip()
row.append((header, content))
rows.append(row)
for row in rows:
print(row)
代码:
import csv
from bs4 import BeautifulSoup
import requests
url = 'https://www.bloomberg.com/markets/stocks'
headers = ('Name', 'Value', 'Net Change', '% Change' '1 Month', '1 Year', 'Time (EDT)')
r = requests.get(url)
soup = BeautifulSoup(r.text, 'lxml')
trs = soup.select('.data-table-body > tr')
with open('data.csv', 'w') as outcsv:
writer = csv.writer(outcsv)
writer.writerow(headers)
for tr in trs:
tds = tr.find_all('td')[:7]
tds[0].select_one('[data-type="abbreviation"]').decompose() # optional
content = [td.text.strip() for td in tds]
writer.writerow(content)
如果要同时存储缩写和全名,例如:INDU:IND DOW JONES INDUS. AVG
删除标记为可选的行。
您可以尝试像下面这样获取数据并相应地编写数据。
import csv
from urllib.request import urlopen
from bs4 import BeautifulSoup
Headers = ["Name", "Value", "Net Change", "% Change", "1 Month", "1 Year", "Time (EDT)"]
res = urlopen('https://www.bloomberg.com/markets/stocks')
soup = BeautifulSoup(res.read(),'html.parser')
with open('bloomberg.csv','w', newline='') as infile:
writer = csv.writer(infile)
writer.writerow(Headers)
for tr in soup.select(".data-table-body tr"):
data = [item.get_text(strip=True) for item in tr.find_all("td")]
writer.writerow(data)