我是bs4的新手,我期待着提取价格表。
我面临的主要问题是,在html页面中,表元素没有出现,但它是一个div
。我试图按class
看,id
但我无法获得价格。
这是我尝试过的:
url = "http://www.valoreazioni.com/indici/ftse-mib_ftsemib_mi"
r = requests.get(url)
data = r.text
soup = BeautifulSoup(data,"html5lib")
以下是我为获取价格表而应用的过滤器 失败
# table=soup.find('div',{'id':'maidMoneyTable'})
# table=soup.find(id='maidMoneyTable')
route=pd.read_html(str(tables),flavor='html5lib')
print(route)
在这两种情况下,回报都是no tables were found
谁能告诉我如何获得所需的桌子?
使用
BeautifulSoup 从页面中抓取数据,暂时将其保存在 sqlite3 表中,然后使用 pandas 功能处理 sql,将其从 sqlite3 获取到 pandas 中。
>>> import requests
>>> page = requests.get('http://www.valoreazioni.com/indici/ftse-mib_ftsemib_mi').content
>>> import bs4
>>> soup = bs4.BeautifulSoup(page, 'lxml')
>>> maidMoneyTable = soup.find_all(id='maidMoneyTable')
>>> table_rows = maidMoneyTable.findAll('li', attrs={'class': 'order'})
>>> for row in table_rows:
... link = row.find('a')
... data = [link.attrs['href']] + [_.text for _ in link.findAll('li')]
... result = c.execute('''INSERT INTO market VALUES (?,?,?,?,?,?,?)''', data)
...
>>> df = pd.read_sql_query('SELECT * FROM market', conn)
>>> df.head()
url symbol
0 http://www.valoreazioni.com/titoli/a2a-a2a-mi A2A.MI
1 http://www.valoreazioni.com/titoli/anima-holdi... ANIM.MI
2 http://www.valoreazioni.com/titoli/atlantia-at... ATL.MI
3 http://www.valoreazioni.com/titoli/azimut-hold... AZM.MI
4 http://www.valoreazioni.com/titoli/banca-medio... BMED.MI
name item_1 item_2 item_3 item_4
0 A2A SpA 1.50 1.503 0.003 +0.200%
1 ANIMA HOLDING SPA 6.26 6.210 -0.040 -0.64%
2 ATLANTIA 25.96 25.640 -0.240 -0.93%
3 AZIMUT HOLDING 17.94 17.930 0.060 +0.34%
4 BANCA MEDIOLANUM 7.43 7.290 -0.150 -2.02%