我正在使用beautifulsoup来抓取网站,但由于我是python和beautifursoup的新手,因此需要帮助如何从以下内容中获得VET"[[VET]]";
这是我迄今为止的代码
import bs4 as bs
import urllib.request
import pandas as pd
#This is the Home page of the website
source = urllib.request.urlopen('file:///C:/Users/Aiden/Downloads/stocks/Stock%20Premarket%20Trading%20Activity%20_%20Biggest%20Movers%20Before%20the%20Market%20Opens.html').read().decode('utf-8')
soup = bs.BeautifulSoup(source,'lxml')
#find the Div and put all info into varTable
table = soup.find('table',{"id":"decliners_tbl"}).tbody
#find all Rows in table and puts into varTableRows
tableRows = table.find_all('tr')
print ("There is ",len(tableRows),"Rows in the Table")
print(tableRows)
columns = [tableRows[1].find_all('td')]
print(columns)
a = [tableRows[1].find_all("a")]
print(a)
So my output from print(a) is "[[<a class="mplink popup_link" href="https://marketchameleon.com/Overview/VET/">VET</a>]]"
and I want to extract VET out
广告
您可以使用.text或.get_text((.
如果你有多个元素,你需要列表理解这个功能
感谢您的回复,我使用以下代码解决了问题
source = urllib.request.urlopen('file:///C:/Users/Aiden/Downloads/stocks/Stock%20Premarket%20Trading%20Activity%20_%20Biggest%20Movers%20Before%20the%20Market%20Opens.html').read().decode('utf-8')
soup = bs.BeautifulSoup(source,'html.parser')
table = soup.find("table",id="decliners_tbl")
for decliners in table.find_all("tbody"):
rows = decliners.find_all("tr")
for row in rows:
ticker = row.find("a").text
volume = row.findAll("td", class_="rightcell")[3].text
print(ticker, volume)