如何在对数据帧进行分组后应用唯一函数和平均函数



我正在研究GPS轨迹。

我试图找到属于三个不同类别的车辆的速度平均值。需要每辆车的平均值。

"Vehicle ID","Frame ID","Total Frames","Global Time","Local X","Local Y","Global X","Global Y","V_Len","V_Width","V_Class","V_Vel","V_Acc","Lane_ID","Pre_Veh","Fol_Veh","Spacing","Headway"
3033,9064,633,1118847885300,42.016,377.256,6451360.093,1873080.530,19.5,8.5,2,27.90,4.29,4,3022,0,93.16,3.34
3033,9065,633,1118847885400,42.060,380.052,6451362.114,1873078.608,19.5,8.5,2,28.43,6.63,4,3022,0,93.87,3.30
3033,9066,633,1118847885500,42.122,382.924,6451364.187,1873076.613,19.5,8.5,2,29.07,6.89,4,3022,0,94.49,3.25
3033,9067,633,1118847885600,42.200,385.882,6451366.307,1873074.553,19.5,8.5,2,29.62,4.41,4,3022,0,95.04,3.21
3033,9068,633,1118847885700,42.265,388.885,6451368.490,1873072.453,19.5,8.5,2,29.93,1.57,4,3022,0,95.57,3.19

df.sort_values(by=["Global Time"])
df["US Time"]=pd.to_datetime(df["Global Time"], unit='ms').dt.tz_localize('UTC' ).dt.tz_convert('America/Los_Angeles')
#Converting gps millisecond TS to US Local Time date format
#sorting
grouped=df.groupby('V_Class')
#find mean of all vehicles in each class
print( grouped['V_Vel'].agg([np.mean,np.std]))
for index, row in df.iterrows(): 
    print (row["Vehicle ID"], row["V_Class"])

实际输出

V_Class     mean        std       
1        40.487673  14.647576
2        37.376317  14.940034
3        40.953483  11.214995

预期产出

Vehicle ID V_Class     mean        std  
3033           2           32.4       12.4
125            1           41.3       9.2
.
likewise

如果您想要每辆车的平均值,只需按车辆分组:

 df.groupby(['Vehicle ID','V_Class'])['V_Vel'].agg([np.mean, np.std])

它应该给出(与您的示例数据(:

                     mean       std
Vehicle ID V_Class                 
3033       2        28.99  0.834955

最新更新