是否可以在此链接中从图表中提取数据点?
https://ycharts.com/companies/AAPL/market_cap
图表位于//*[@id="dataChartCanvass1"]
不是图表下面的表格。
我试图查看网站的来源,但我只能看到表格中的数据点。
是否可以使用python和请求?我应该从哪里开始呢?
您可以模拟它们的Ajax调用来获得图表代码点,例如:
import json
import requests
import pandas as pd
api_url = "https://ycharts.com/charts/fund_data.json"
params = {
"securities": "id:AAPL,include:true,,", # <-- ticker here
"calcs": "id:market_cap,include:true,,",
"correlations": "",
"format": "real",
"recessions": "false",
"zoom": "5",
"startDate": "",
"endDate": "",
"chartView": "",
"splitType": "single",
"scaleType": "linear",
"note": "",
"title": "",
"source": "false",
"units": "false",
"quoteLegend": "true",
"partner": "",
"quotes": "",
"legendOnChart": "true",
"securitylistSecurityId": "",
"displayTicker": "false",
"ychartsLogo": "",
"useEstimates": "false",
"maxPoints": "918",
}
data = requests.get(api_url, params=params).json()
# uncomment to see all data:
# print(json.dumps(data, indent=4))
df = pd.DataFrame(
data["chart_data"][0][0]["raw_data"], columns=["date", "value"]
)
df["date"] = pd.to_datetime(df["date"] / 1000, unit="s")
df["value"] = df["value"].astype(int)
print(df)
打印:
date value
0 2016-08-29 575593
1 2016-09-06 580335
2 2016-09-09 555710
3 2016-09-16 619239
4 2016-09-23 607331
5 2016-09-30 603253
6 2016-10-07 608643
7 2016-10-14 627239
8 2016-10-21 621747
9 2016-10-28 606390
10 2016-11-04 580368
11 2016-11-11 578182
...and so on.