比较两列中的值并提取数据框中第三列的值
df =
<表类>
位置
团队目标 tbody><<tr>1 5 1B 6 27 2B 5 2C 6 3 B 7 表类>
示例代码
data = {'Location': {0: 1, 1: 1, 2: 2, 3: 2, 4: 2, 5: 3},
'teams': {0: 'A', 1: 'B', 2: 'A', 3: 'B', 4: 'C', 5: 'B'},
'goals': {0: 5, 1: 6, 2: 7, 3: 5, 4: 6, 5: 7}}
df = pd.DataFrame(data)
首先与groupby
的聚合
(df.groupby(['Location', 'teams'])['goals'].agg(['count', sum])
.unstack().swaplevel(0, 1, axis=1).sort_index(axis=1))
输出:
teams A B C
count sum count sum count sum
Location
1 1.0 5.0 1.0 6.0 NaN NaN
2 1.0 7.0 1.0 5.0 1.0 6.0
3 NaN NaN 1.0 7.0 NaN NaN
第二
让我们创建idx
来更改列
idx = pd.MultiIndex.from_product([df['teams'].unique(), ['Team', 'Team Goal']]).map(lambda x: ' '.join(x))
idx
Index(['A Team', 'A Team Goal', 'B Team', 'B Team Goal', 'C Team', 'C Team Goal'], dtype='object')
>更改列和reset_index
(包括第一个代码)
(df.groupby(['Location', 'teams'])['goals'].agg(['count', sum])
.unstack().swaplevel(0, 1, axis=1).sort_index(axis=1)
.set_axis(idx, axis=1).reset_index())
Location A Team A Team Goal B Team B Team Goal C Team C Team Goal
0 1 1.0 5.0 1.0 6.0 NaN NaN
1 2 1.0 7.0 1.0 5.0 1.0 6.0
2 3 NaN NaN 1.0 7.0 NaN NaN
使用pivot重塑数据框架以获得目标。检查goals
中的非空值以识别teams
,然后join
以获得结果
goals = df.pivot(*df.columns)
teams = s.notna().astype(int)
teams.add_suffix(' Team').join(goals.add_suffix(' Team Goals'))
结果
teams A Team B Team C Team A Team Goals B Team Goals C Team Goals
Location
1 1 1 0 5.0 6.0 NaN
2 1 1 1 7.0 5.0 6.0
3 0 1 0 NaN 7.0 NaN