origin_destination_country average_delay_mins
0 ALBANIA 0.00
1 ALBANIA 13.68
2 ALBANIA 0.00
3 ALBANIA 0.00
4 ALBANIA 79.50
... ... ...
6273 USA 0.00
6274 UZBEKISTAN 27.32
6275 ZAMBIA 16.08
6276 ZIMBABWE 1165.00
6277 ZIMBABWE 102.97
如何计算每个国家(average_delay_mins(的平均值?我的想法是计算与类似的origin_destination_country名称对应的值,并将它们存储在另一个没有重复国家名称的列表中。
试试这段代码,让我知道它是否有效。
import pandas as pd
df = pd.DataFrame({
'origin_destination_country': ['ALBANIA', 'ALBANIA','ALBANIA', 'USA', 'ZIMBABWE', 'ZIMBABWE'],
'average_delay_mins': [0.00, 13.68,0.00,0.00,1165.00,102.97]
})
#get unique country names
list_of_countries = df['origin_destination_country'].unique()
res = []
for i in range(len(list_of_countries)):
#get series of identical country names
get_series = df[df['origin_destination_country'] == list_of_countries[i]]['average_delay_mins'].tolist()
res.append(sum(get_series) / len(get_series))
print(res)
感谢Naufal_Hilmiaji和Code_Difference我只是设法找到了解决方案,结果是这样的:
import pandas as pd
df = pd.DataFrame({
'origin_destination_country': ['ALBANIA', 'ALBANIA','ALBANIA', 'USA', ...,'ZIMBABWE', 'ZIMBABWE'],
'average_delay_mins': [0.00, 13.68,0.00,0.00,...,1165.00,102.97]
})
data = df.groupby('origin_destination_country').mean()
print(data)