我试图在雅虎网站上抓取一些基本的基本数据。我以前可以用BeautifulSoup做这个。不知怎么的,它不管用。现在是使用Selenium的时候了吗?
url = https://finance.yahoo.com/quote/2638.HK/key-statistics?p=2638.HK
如何获取尾随p/E"或"股本回报率(ttm)";或者"Whatever field"?
您仍然可以使用BeautifulSoup:
from bs4 import BeautifulSoup
from requests import get
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:89.0) Gecko/20100101 Firefox/89.0"
}
url = 'https://finance.yahoo.com/quote/2638.HK/key-statistics?p=2638.HK'
response = get(url, url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
data = soup.find_all("table")
def grab_by_argument(table_array, parameter):
for table in table_array:
trs = table.find_all('tr')
for tr in trs:
tds = tr.find_all('td')
if parameter.lower() in tds[0].get_text().lower():
return(tds[1].get_text())
print(grab_by_argument(data, "Trailing P/E"))
如果你有多个参数要抓取,只需将它们添加到列表中并遍历它们:
args = ["Trailing P/E", "Return on Equity (ttm)"]
for arg in args:
print(grab_by_argument(data, arg))
如果您需要获取所有股票数据:
def grab_all(stock_data):
for table in stock_data:
trs = table.find_all('tr')
for tr in trs:
tds = tr.find_all('td')
print("Measure: {}".format(tds[0].get_text()))
print("Value: {}".format(tds[1].get_text()))
print("")
,然后用:
grab_all(data)
您也可以将所有信息作为表格返回而不打印。