python3webcrapping-roop只返回一次迭代



python3网络抓取(我正试图从html数据中提取表,并将其存储到一个新的数据帧中。我需要所有的"td"值,但当我尝试迭代时,循环只返回第一行,而不是所有行。下面是我的代码和输出

!pip install yfinance
!pip install pandas
!pip install requests
!pip install bs4
!pip install plotly
import yfinance as yf
import pandas as pd
import requests
from bs4 import BeautifulSoup
import plotly.graph_objects as go
from plotly.subplots import make_subplots
def make_graph(stock_data, revenue_data, stock):
fig = make_subplots(rows=2, cols=1, shared_xaxes=True, subplot_titles=("Historical Share Price", "Historical Revenue"), vertical_spacing = .3)
stock_data_specific = stock_data[stock_data.Date <= '2021--06-14']
revenue_data_specific = revenue_data[revenue_data.Date <= '2021-04-30']
fig.add_trace(go.Scatter(x=pd.to_datetime(stock_data_specific.Date, infer_datetime_format=True), y=stock_data_specific.Close.astype("float"), name="Share Price"), row=1, col=1)
fig.add_trace(go.Scatter(x=pd.to_datetime(revenue_data_specific.Date, infer_datetime_format=True), y=revenue_data_specific.Revenue.astype("float"), name="Revenue"), row=2, col=1)
fig.update_xaxes(title_text="Date", row=1, col=1)
fig.update_xaxes(title_text="Date", row=2, col=1)
fig.update_yaxes(title_text="Price ($US)", row=1, col=1)
fig.update_yaxes(title_text="Revenue ($US Millions)", row=2, col=1)
fig.update_layout(showlegend=False,
height=900,
title=stock,
xaxis_rangeslider_visible=True)
fig.show()
tsla = yf.Ticker("TSLA")
tsla
tesla_data = tsla.history(period="max")
tesla_data

tesla_data.reset_index(inplace=True)
tesla_data.head()
url = "https://www.macrotrends.net/stocks/charts/TSLA/tesla/revenue"
html_data  = requests.get(url).text

soup = BeautifulSoup(html_data, 'html.parser')
tesla_revenue = pd.DataFrame(columns=["Date", "Revenue"])
for row in soup.find("tbody").find_all('tr'): 
col = row.find_all("td")
date = col[0].text
revenue = col[1].text
tesla_revenue = tesla_revenue.append({"Date":date, "Revenue":revenue}, ignore_index=True)
tesla_revenue

日期0200815美元

会发生什么

它工作得很好,但您将数据附加到循环之外,所以您总是得到上一次迭代的结果。

如何修复

修复你的缩进并将附加部分放入你的循环

tesla_revenue = pd.DataFrame(columns=["Date", "Revenue"])
for row in soup.find("tbody").find_all('tr'): 
col = row.find_all("td")
date = col[0].text
revenue = col[1].text
tesla_revenue = tesla_revenue.append({"Date":date, "Revenue":revenue}, ignore_index=True)
tesla_revenue

示例

from bs4 import BeautifulSoup
import requests
import pandas as pd
url = "https://www.macrotrends.net/stocks/charts/TSLA/tesla/revenue"
html_data  = requests.get(url).text
soup = BeautifulSoup(html_data, 'html.parser')
tesla_revenue = pd.DataFrame(columns=["Date", "Revenue"])
for row in soup.find("tbody").find_all('tr'): 
col = row.find_all("td")
date = col[0].text
revenue = col[1].text
tesla_revenue = tesla_revenue.append({"Date":date, "Revenue":revenue}, ignore_index=True)
tesla_revenue

输出

日期01201821461美元3456

使用适当的类和标签查找主表

res=requests.get("https://www.macrotrends.net/stocks/charts/TSLA/tesla/revenue")
soup=BeautifulSoup(res.text,"html.parser")
teable=soup.find("table",class_="historical_data_table table")
main_data=table.find_all("tr")     

现在将数据附加到列表并创建列表数据列表,以创建DataFrame 的行数据

main_lst=[]
for i in main_data[1:]:
lst=[data.get_text(strip=True) for data in i.find_all("td")]
main_lst.append(lst)

现在使用该数据显示为df

import pandas as pd
df=pd.DataFrame(columns=["Date","Price"],data=main_lst)
df

输出:

Date    Price
0   2020    $31,536
1   2019    $24,578
2   2018    $21,461
3   2017    $11,759
...

在使用pandas的一个衬垫中

df=pd.read_html("https://www.macrotrends.net/stocks/charts/TSLA/tesla/revenue")
print(len(df))
print(df[0])

输出

6
Date    Price
0   2020    $31,536
1   2019    $24,578
2   2018    $21,461
3   2017    $11,759

最新更新