尝试使用json_normalize从dataframe平坦嵌套字典时出错(错误:字符串索引必须是整数)



完全是新手,试图破解一个项目。

目标:为大学足球比赛创建一个由各种体育博彩给出的赔率数据表

问题:在调用API并转换为数据帧后,我发现概率列包含一个嵌套的字典列表。样本值:

[{'key': 'fanduel', 'title': 'FanDuel', 'last_update': '2022-12-09T06:41:42Z', 'markets': 
[{'key': 'spreads', 'outcomes': 
[{'name': 'Army Black Knights', 'price': -108, 'point': 2.5}, {'name': 'Navy Midshipmen', 'price': -112, 'point': -2.5}]}]}, 
{'key': 'williamhill_us', 'title': 'William Hill (US)', 'last_update': '2022-12-09T06:41:22Z', 'markets': 
[{'key': 'spreads', 'outcomes': 
[{'name': 'Army Black Knights', 'price': -110, 'point': 2.5}, {'name': 'Navy Midshipmen', 'price': -110, 'point': -2.5}]}]},

当我尝试使用json_normalize时,我得到了错误:&;字符串索引必须是整数&;

下面是我的代码:
import requests
import pandas as pd
odds = requests.get(" https://api.the-odds-api.com/v4/sports/americanfootball_ncaaf/odds/?apiKey=___(user key)__&regions=us&markets=spreads,spreads&oddsFormat=american")
odds_df = pd.read_json(odds.text)
odds_df.head()
#excluding non-needed columns in this example:
| id    | home_team | away_team | bookmakers
| 123   | Army      | Navy      | [{'key': 'fanduel', 'title': 'FanDuel', 'last_...
| 456   | UAB       | Miami (OH)| [{'key': 'barstool', 'title': 'Barstool Sports...

df = pd.json_normalize(odds_df,'bookmakers',['id', 'home_team', 'away_team'])

TypeError: string index必须是整数

Expected outcome:
| id    | home_team | away_team | sportsbook    | last_update         | odds_type | home_point
| 123   | Army      | Navy      | fanduel       | 2022-12-09T06:41:42Z| spread    | 2.5
| 123   | Army      | Navy      | williamhill_us| 2022-12-09T06:41:22Z| spread    | 2.5


你可以试试这个吗:

odds_df=odds_df.explode('bookmakers').reset_index(drop=True) #explode bookmarks col
#normalize bookmakers column and join odds_df. Because we need id, home_team eg. title is your sportsbook column here. if it si wrong change to key.
df = odds_df.join(pd.json_normalize(odds_df['bookmakers'],record_path=['markets'],meta=['title','last_update'],meta_prefix='_')).drop('bookmakers',axis=1).explode('outcomes').reset_index(drop=True)
df =df.join(pd.json_normalize(df.pop('outcomes'))).drop_duplicates()
'''
|    |   id | home_team   | away_team   | key     | _title            | _last_update         | name               |   price |   point |
|---:|-----:|:------------|:------------|:--------|:------------------|:---------------------|:-------------------|--------:|--------:|
|  0 |  123 | Army        | Navy        | spreads | FanDuel           | 2022-12-09T06:41:42Z | Army Black Knights |    -108 |     2.5 |
|  1 |  123 | Army        | Navy        | spreads | FanDuel           | 2022-12-09T06:41:42Z | Navy Midshipmen    |    -112 |    -2.5 |
|  2 |  123 | Army        | Navy        | spreads | William Hill (US) | 2022-12-09T06:41:22Z | Army Black Knights |    -110 |     2.5 |
|  3 |  123 | Army        | Navy        | spreads | William Hill (US) | 2022-12-09T06:41:22Z | Navy Midshipmen    |    -110 |    -2.5 |
'''

注意:你的输出不清楚。这就是为什么我添加了完整的输出