我有一个带有"Innings"列的df,其中包含以下数据:
Innings
[{'InningID': 297503, 'GameID': 47547, 'InningNumber': 1, 'AwayTeamRuns': 0, 'HomeTeamRuns': 3}, {'InningID': 297504, 'GameID': 47547, 'InningNumber': 2, 'AwayTeamRuns': 2, 'HomeTeamRuns': 2}, {'InningID': 297505, 'GameID': 47547, 'InningNumber': 3, 'AwayTeamRuns': 0, 'HomeTeamRuns': 2}, {'InningID': 297506, 'GameID': 47547, 'InningNumber': 4, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297507, 'GameID': 47547, 'InningNumber': 5, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297508, 'GameID': 47547, 'InningNumber': 6, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297509, 'GameID': 47547, 'InningNumber': 7, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297510, 'GameID': 47547, 'InningNumber': 8, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297512, 'GameID': 47547, 'InningNumber': 9, 'AwayTeamRuns': 1, 'HomeTeamRuns': None}]
[{'InningID': 297511, 'GameID': 47546, 'InningNumber': 1, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297513, 'GameID': 47546, 'InningNumber': 2, 'AwayTeamRuns': 1, 'HomeTeamRuns': 0}, {'InningID': 297514, 'GameID': 47546, 'InningNumber': 3, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297515, 'GameID': 47546, 'InningNumber': 4, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297516, 'GameID': 47546, 'InningNumber': 5, 'AwayTeamRuns': 1, 'HomeTeamRuns': 0}, {'InningID': 297517, 'GameID': 47546, 'InningNumber': 6, 'AwayTeamRuns': 1, 'HomeTeamRuns': 3}, {'InningID': 297518, 'GameID': 47546, 'InningNumber': 7, 'AwayTeamRuns': 1, 'HomeTeamRuns': 0}, {'InningID': 297519, 'GameID': 47546, 'InningNumber': 8, 'AwayTeamRuns': 0, 'HomeTeamRuns': 1}, {'InningID': 297520, 'GameID': 47546, 'InningNumber': 9, 'AwayTeamRuns': 1, 'HomeTeamRuns': 2}]
[{'InningID': 297521, 'GameID': 50022, 'InningNumber': 1, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297522, 'GameID': 50022, 'InningNumber': 2, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297523, 'GameID': 50022, 'InningNumber': 3, 'AwayTeamRuns': 0, 'HomeTeamRuns': 1}, {'InningID': 297524, 'GameID': 50022, 'InningNumber': 4, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297525, 'GameID': 50022, 'InningNumber': 5, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297526, 'GameID': 50022, 'InningNumber': 6, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297527, 'GameID': 50022, 'InningNumber': 7, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297528, 'GameID': 50022, 'InningNumber': 8, 'AwayTeamRuns': 0, 'HomeTeamRuns': 2}, {'InningID': 297529, 'GameID': 50022, 'InningNumber': 9, 'AwayTeamRuns': 3, 'HomeTeamRuns': 1}]
[{'InningID': 297530, 'GameID': 47556, 'InningNumber': 1, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297532, 'GameID': 47556, 'InningNumber': 2, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297534, 'GameID': 47556, 'InningNumber': 3, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297536, 'GameID': 47556, 'InningNumber': 4, 'AwayTeamRuns': 2, 'HomeTeamRuns': 0}, {'InningID': 297542, 'GameID': 47556, 'InningNumber': 5, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297545, 'GameID': 47556, 'InningNumber': 6, 'AwayTeamRuns': 0, 'HomeTeamRuns': 1}, {'InningID': 297549, 'GameID': 47556, 'InningNumber': 7, 'AwayTeamRuns': 0, 'HomeTeamRuns': 2}, {'InningID': 297554, 'GameID': 47556, 'InningNumber': 8, 'AwayTeamRuns': 0, 'HomeTeamRuns': 1}, {'InningID': 297558, 'GameID': 47556, 'InningNumber': 9, 'AwayTeamRuns': 0, 'HomeTeamRuns': None}]
[{'InningID': 297531, 'GameID': 47557, 'InningNumber': 1, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297533, 'GameID': 47557, 'InningNumber': 2, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297535, 'GameID': 47557, 'InningNumber': 3, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297537, 'GameID': 47557, 'InningNumber': 4, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297543, 'GameID': 47557, 'InningNumber': 5, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297546, 'GameID': 47557, 'InningNumber': 6, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297551, 'GameID': 47557, 'InningNumber': 7, 'AwayTeamRuns': 0, 'HomeTeamRuns': 6}, {'InningID': 297560, 'GameID': 47557, 'InningNumber': 8, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297566, 'GameID': 47557, 'InningNumber': 9, 'AwayTeamRuns': 0, 'HomeTeamRuns': None}]
[{'InningID': 297539, 'GameID': 47549, 'InningNumber': 1, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297540, 'GameID': 47549, 'InningNumber': 2, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297544, 'GameID': 47549, 'InningNumber': 3, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297548, 'GameID': 47549, 'InningNumber': 4, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297553, 'GameID': 47549, 'InningNumber': 5, 'AwayTeamRuns': 0, 'HomeTeamRuns': 5}, {'InningID': 297557, 'GameID': 47549, 'InningNumber': 6, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297561, 'GameID': 47549, 'InningNumber': 7, 'AwayTeamRuns': 3, 'HomeTeamRuns': 0}, {'InningID': 297573, 'GameID': 47549, 'InningNumber': 8, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297579, 'GameID': 47549, 'InningNumber': 9, 'AwayTeamRuns': 0, 'HomeTeamRuns': None}]
[{'InningID': 297538, 'GameID': 47558, 'InningNumber': 1, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297541, 'GameID': 47558, 'InningNumber': 2, 'AwayTeamRuns': 2, 'HomeTeamRuns': 0}, {'InningID': 297547, 'GameID': 47558, 'InningNumber': 3, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297552, 'GameID': 47558, 'InningNumber': 4, 'AwayTeamRuns': 2, 'HomeTeamRuns': 0}, {'InningID': 297556, 'GameID': 47558, 'InningNumber': 5, 'AwayTeamRuns': 0, 'HomeTeamRuns': 5}, {'InningID': 297562, 'GameID': 47558, 'InningNumber': 6, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297568, 'GameID': 47558, 'InningNumber': 7, 'AwayTeamRuns': 2, 'HomeTeamRuns': 0}, {'InningID': 297577, 'GameID': 47558, 'InningNumber': 8, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297584, 'GameID': 47558, 'InningNumber': 9, 'AwayTeamRuns': 1, 'HomeTeamRuns': 0}]
[{'InningID': 297550, 'GameID': 47548, 'InningNumber': 1, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297555, 'GameID': 47548, 'InningNumber': 2, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297559, 'GameID': 47548, 'InningNumber': 3, 'AwayTeamRuns': 0, 'HomeTeamRuns': 2}, {'InningID': 297567, 'GameID': 47548, 'InningNumber': 4, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297569, 'GameID': 47548, 'InningNumber': 5, 'AwayTeamRuns': 1, 'HomeTeamRuns': 0}, {'InningID': 297578, 'GameID': 47548, 'InningNumber': 6, 'AwayTeamRuns': 1, 'HomeTeamRuns': 0}, {'InningID': 297583, 'GameID': 47548, 'InningNumber': 7, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297587, 'GameID': 47548, 'InningNumber': 8, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297591, 'GameID': 47548, 'InningNumber': 9, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297594, 'GameID': 47548, 'InningNumber': 10, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297597, 'GameID': 47548, 'InningNumber': 11, 'AwayTeamRuns': 0, 'HomeTeamRuns': 1}]
[]
[{'InningID': 297564, 'GameID': 47553, 'InningNumber': 1, 'AwayTeamRuns': 2, 'HomeTeamRuns': 0}, {'InningID': 297572, 'GameID': 47553, 'InningNumber': 2, 'AwayTeamRuns': 1, 'HomeTeamRuns': 0}, {'InningID': 297574, 'GameID': 47553, 'InningNumber': 3, 'AwayTeamRuns': 0, 'HomeTeamRuns': 1}, {'InningID': 297580, 'GameID': 47553, 'InningNumber': 4, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297586, 'GameID': 47553, 'InningNumber': 5, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297590, 'GameID': 47553, 'InningNumber': 6, 'AwayTeamRuns': 1, 'HomeTeamRuns': 0}, {'InningID': 297596, 'GameID': 47553, 'InningNumber': 7, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297599, 'GameID': 47553, 'InningNumber': 8, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297603, 'GameID': 47553, 'InningNumber': 9, 'AwayTeamRuns': 0, 'HomeTeamRuns': 2}]
[{'InningID': 297563, 'GameID': 47555, 'InningNumber': 1, 'AwayTeamRuns': 1, 'HomeTeamRuns': 0}, {'InningID': 297570, 'GameID': 47555, 'InningNumber': 2, 'AwayTeamRuns': 0, 'HomeTeamRuns': 1}, {'InningID': 297575, 'GameID': 47555, 'InningNumber': 3, 'AwayTeamRuns': 0, 'HomeTeamRuns': 5}, {'InningID': 297582, 'GameID': 47555, 'InningNumber': 4, 'AwayTeamRuns': 0, 'HomeTeamRuns': 3}, {'InningID': 297588, 'GameID': 47555, 'InningNumber': 5, 'AwayTeamRuns': 0, 'HomeTeamRuns': 3}, {'InningID': 297593, 'GameID': 47555, 'InningNumber': 6, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297595, 'GameID': 47555, 'InningNumber': 7, 'AwayTeamRuns': 1, 'HomeTeamRuns': 0}, {'InningID': 297598, 'GameID': 47555, 'InningNumber': 8, 'AwayTeamRuns': 1, 'HomeTeamRuns': 2}, {'InningID': 297601, 'GameID': 47555, 'InningNumber': 9, 'AwayTeamRuns': 0, 'HomeTeamRuns': None}]
我需要将局数列分成许多(数字可以一行不同...... 我已经尝试了爆炸函数,但不起作用,或者因为字段的随机数而拆分。
最好的方法是创建新的列,例如:
InningID1 : xxx, InningNumber1: x, AwayTeamsRuns1: x, HomeTeamRuns1: x, InningID2 : xxx, InningNumber2: x, AwayTeamsRuns2: x, HomeTeamRuns2: x etc...
我不关心 GameId,因为我已经在 df 的同一行中有这些信息。
有什么建议吗?
非常感谢
杰弗里
你可以试试这个,先清理数据,将'替换为",将None转换为"None",然后遍历DataFrame,将json字符串转换为字典,使用json_normalize
将字典转换为DataFrame,最后concat
它们:
df['Innings'] = df['Innings'].str.replace("'", '"')
df['Innings'] = df['Innings'].str.replace("None", '"None"')
dfs = []
for i in range(0, df.shape[0]):
j_str = df.loc[i]['Innings']
x = json.loads(j_str)
dfs.append(pd.json_normalize(x))
df = pd.concat(dfs).reset_index(drop=['index'])
print(df)
输出:
InningID GameID InningNumber AwayTeamRuns HomeTeamRuns
0 297503 47547 1 0 3
1 297504 47547 2 2 2
2 297505 47547 3 0 2
3 297506 47547 4 0 0
4 297507 47547 5 0 0
.. ... ... ... ... ...
87 297588 47555 5 0 3
88 297593 47555 6 0 0
89 297595 47555 7 1 0
90 297598 47555 8 1 2
91 297601 47555 9 0 None
这看起来是一个很好的机会来使用DictVectorizer
import pandas as pd
from sklearn.feature_extraction import DictVectorizer
vectorizer = DictVectorizer()
dicts = df.explode("Innings")["Innings"]
data = vectorizer.fit_transform(dicts).todense()
innings_df = pd.DataFrame(data, columns=vectorizer.feature_names_)
缺少的键将自动填充零,请确保这是您想要的行为。
现在,如果您愿意,可以将innings_df
与原始数据帧(df
(连接/联接