使用 Python 的 BeautifiulSoup 库解析 Span HTML 标记中的信息

我正在编写一个Python网络抓取器，可以抓取某只股票的价格。在我的程序结束时，有一些打印语句可以正确解析 html 数据，以便我可以在某个 HTML span 标签中获取股票的价格信息。我的问题是：我该怎么做？到目前为止，我已经获得了正确的 HTML span 标签。我以为你可以简单地做一个字符串拼接，但是股票的价格会不断变化，我认为这个解决方案不利于这个问题。我最近开始使用BeautifulSoup，所以一点建议将不胜感激。

import bs4
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
# webscraping reference http://altitudelabs.com/blog/web-scraping-with-python-and-beautiful-soup/
my_url = 'https://quotes.wsj.com/GRPS/options'
#opens up a web connection and "downloads"a copy of the desired webpage
uClient = uReq(my_url)
#dumps the information read on the webpade into a variable for later use/parsing
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "lxml")
#find the html location for the price of the stock
#<span id="quote_val">0.0008</span>
all_stock_info = page_soup.find("section",{"class":"sector cr_section_1"})
find_spans = all_stock_info.find("span",{"id":"quote_val"})
price = page_soup.findAll("span",{"id":"quote_val"})
#sanity checks to make sure the scraper is finding the correct info
print(all_stock_info)
print(len(all_stock_info))
print(len(price))
print(price)  #this gives me the right span, I just need to be able to parse 
#the price of the stock between here (in this case 0.0008) no 
#matter what the price is
print(all_stock_info.span)
print(find_spans)

您可以将.find与.text函数一起使用来获取所需的值。

前任：

from bs4 import BeautifulSoup
page_soup = BeautifulSoup(html, "lxml")
price = page_soup.find("span",{"id":"quote_val"}).text
print( price )

输出：

0.0008

相关内容

最新更新

热门标签：