将多个具有不同键的字典组合到一个数据帧



我正在尝试从具有不同键名的字典创建数据帧。

这是一个MWE:

# loads price data
from yahoofinancials import YahooFinancials  
yahoo_tickers  = ['SMT.L', 'MSFT', 'NIO']
yahoo_financials = YahooFinancials(yahoo_tickers)
data = yahoo_financials.get_historical_price_data(start_date='1970-01-30',
end_date='2022-12-30',
time_interval='daily')

这将打印一个包含三个键(这里是雅虎股票报价器的名称)的字典,其中包含有关每种股票的信息。这将作为字典中的另一个字典存储。

我试图通过循环来清理这个dict,但最终是一个相当缓慢的过程,特别是有更多的库存。

有人建议我可以快速转换datadictdataframe看起来像这样:

我的预期结果:

Out[11]: 
high     low    open  ...   adjclose  yahoo_ticker  instrumentType
Date                                ...                                         
1968-12-31   4.852   4.852   4.852  ...   1.762543         SMT.L          EQUITY
1969-01-01   4.852   4.852   4.852  ...   1.762543         SMT.L          EQUITY
1969-01-02   4.852   4.852   4.852  ...   1.762543         SMT.L          EQUITY
1969-01-03   4.852   4.852   4.852  ...   1.762543         SMT.L          EQUITY
1969-01-06   4.852   4.852   4.852  ...   1.762543         SMT.L          EQUITY
...     ...     ...  ...        ...           ...             ...
2022-12-23  11.220  10.690  11.220  ...  10.970000           NIO          EQUITY
2022-12-27  10.610   9.970  10.530  ...  10.060000           NIO          EQUITY
2022-12-28  10.250   9.610  10.010  ...   9.800000           NIO          EQUITY
2022-12-29  10.270   9.770   9.920  ...   9.990000           NIO          EQUITY
2022-12-30   9.980   9.520   9.830  ...   9.750000           NIO          EQUITY

我建议使用列表压缩来平整化嵌套字典,这将比使用pandas规范化方法更有效。

pd.DataFrame([
{
**p,
'yahooTicker': k, 
'instrumentType': v['instrumentType']
} for k, v in data.items() for p in v['prices']
]).drop(columns=['date']).set_index('formatted_date')

结果

high    low   open  close  volume  adjclose yahooTicker instrumentType
formatted_date                                                                         
1970-01-30      3.852  3.852  3.852  3.852       0  1.399282       SMT.L         EQUITY
1970-02-02      3.852  3.852  3.852  3.852       0  1.399282       SMT.L         EQUITY
1970-02-03      3.852  3.852  3.852  3.852       0  1.399282       SMT.L         EQUITY
1970-02-04      3.852  3.852  3.852  3.852       0  1.399282       SMT.L         EQUITY
1970-02-05      3.852  3.852  3.852  3.852       0  1.399282       SMT.L         EQUITY
...

使用json_normalize()。因为它们不只是一个主键,所以您需要使用列表比较和连接结果进行规范化。嵌套列"prices"将在record_path中使用,以平整化数据。使用方法链接来清理和格式化df.

import pandas as pd
from yahoofinancials import YahooFinancials

yahoo_tickers = ["SMT.L", "MSFT", "NIO"]
yahoo_financials = YahooFinancials(yahoo_tickers)
data = yahoo_financials.get_historical_price_data(
start_date="1970-01-30",
end_date="2022-12-30",
time_interval="daily"
)
df = (
pd
.concat([pd.json_normalize(data=data.get(x), record_path=["prices"], meta=["instrumentType"]) for x in data], keys=data.keys())
.droplevel(level=1)
.reset_index(names="yahoo_ticker")
.drop(columns="date")
.set_index("formatted_date")
.rename(columns={"formatted_date": "date"})
)
print(df)

输出:

yahoo_ticker    high     low  ...    volume   adjclose  instrumentType
formatted_date                               ...                                     
1970-01-30            SMT.L   3.852   3.852  ...         0   1.399282          EQUITY
1970-02-02            SMT.L   3.852   3.852  ...         0   1.399282          EQUITY
1970-02-03            SMT.L   3.852   3.852  ...         0   1.399282          EQUITY
1970-02-04            SMT.L   3.852   3.852  ...         0   1.399282          EQUITY
1970-02-05            SMT.L   3.852   3.852  ...         0   1.399282          EQUITY
...                     ...     ...     ...  ...       ...        ...             ...
2022-12-22              NIO  11.580  10.760  ...  32468900  11.290000          EQUITY
2022-12-23              NIO  11.220  10.690  ...  33610000  10.970000          EQUITY
2022-12-27              NIO  10.610   9.970  ...  54165700  10.060000          EQUITY
2022-12-28              NIO  10.250   9.610  ...  42225300   9.800000          EQUITY
2022-12-29              NIO  10.270   9.770  ...  49380200   9.990000          EQUITY

最新更新