如何从Python的两个dicts构建DataFrame



我正在尝试构建一个数据帧,在其中尝试从dicts获取datacolumn。(我试着用pd.Series做这件事,但我也一直遇到问题。(

import requests
import pandas as pd
from bs4 import BeautifulSoup
# get link and parse
page = requests.get('https://www.finviz.com/screener.ashx?v=111&ft=4')
soup = BeautifulSoup(page.text, 'html.parser')
# return 'Title's for each filter
# to be used as columns in dataframe
titles = soup.find_all('span', attrs={'class': 'screener-combo-title'})
title_list = []
for t in titles:
t = t.stripped_strings
t = ' '.join(t)
title_list.append(t)
title_list = {k: v for k, v in enumerate(title_list)}
# finding filters-cells tag id's
# to be used to build url
filters = soup.find_all('select', attrs={'data-filter': True})
filter_list = []
for f in filters:
filter_list.append(f.get('data-filter'))
# finding selectable values per cell
# to be used as data in dataframe
final_list = []
for f in filters:
options = f.find_all('option', attrs={'value': True})
option_list = []    # list needs to stay inside
for option in options:
if option['value'] != "":
option_list.append(option['value'])
final_list.append(option_list)
final_list = {k: v for k, v in enumerate(final_list)}

df = pd.DataFrame([final_list], columns=[title_list])
print(df)

这导致了TypeError: unhashable type: 'dict'。一个示例如下(第一列不是索引(:

Exchange    Index     ...
amex     s&p500     ...
nasd     djia
nyse

这里尝试构建一个dict,其中key对应于筛选值,value对应于可能的选择列表。它适合你的需要吗?

import requests
import pandas as pd
from bs4 import BeautifulSoup
# get link and parse
page = requests.get('https://www.finviz.com/screener.ashx?v=111&ft=4')
soup = BeautifulSoup(page.text, 'html.parser')
all_dict = {}
filters = soup.find_all('td', attrs={'class': 'filters-cells'})
for i in range(len(filters) // 2):
i_title = 2 * i
i_value = 2 * i + 1
sct = filters[i_title].find_all('span', attrs={'class': 'screener-combo-title'})
if len(sct)== 1:
title = ' '.join(sct[0].stripped_strings)      
values = [v.text for v in filters[i_value].find_all('option', attrs={'value': True}) if v.text]
all_dict[title] = values
max_element = max([len(v) for v in all_dict.values()])
for k in all_dict:
all_dict[k] = all_dict[k] + [''] * (max_element - len(all_dict[k]))
df = pd.DataFrame.from_dict(all_dict) 

最新更新