我有一个数据集,如下所示。我想了解印度对每个对手的比赛总数吗?
date Team_1 Team_2 Winner Margin Ground
0 04-Jan-10 Bangladesh Sri Lanka Sri Lanka 7 wickets Dhaka
1 05-Jan-10 India Sri Lanka Sri Lanka 5 wickets Dhaka
2 07-Jan-10 Bangladesh India India 6 wickets Dhaka
3 08-Jan-10 Bangladesh Sri Lanka Sri Lanka 9 wickets Dhaka
4 10-Jan-10 India Sri Lanka India 8 wickets Dhaka
5 11-Jan-10 Bangladesh India India 6 wickets Dhaka
6 13-Jan-10 India Sri Lanka Sri Lanka 4 wickets Dhaka
7 22-Jan-10 Australia Pakistan Australia 5 wickets Brisbane
谁能帮忙?
您可以尝试使用stack
和value_counts
,然后使用.loc[]
从结果中过滤掉India
m = df[['Team_1','Team_2']]
d = m[m.eq('India').any(1)].stack().value_counts().loc[lambda x:x.index!='India'].to_dict()
或 1 行:
d1= (df.loc[df.eq('India').any(1),['Team_1','Team_2']]
.stack().value_counts().loc[lambda x:x.index!='India'].to_dict())
print(d)
#output -> {'Sri Lanka': 3, 'Bangladesh': 2}
所需的数据结构必须具有重复的条目(排列与组合(,因此首先将数据帧加倍,同时反转团队的方向。然后只需按两个团队列分组并计算对:
pd.concat([scores, scores.rename(columns={"Team_1": "Team_2", "Team_2": "Team_1"})])
[['Team_1', 'Team_2', 'date']]
.groupby(["Team_1", "Team_2"])
.count()
date
Team_1 Team_2
Australia Pakistan 1
Bangladesh India 2
SriLanka 2
India Bangladesh 2
SriLanka 3
Pakistan Australia 1
SriLanka Bangladesh 2
India 3