我有一个数据帧,如下所示。我理解df.groupby("degree").mean()
将通过列degree
为我提供平均值。我想采用这些方法,找出每个数据点和这些平均值之间的距离。在这种情况下。对于每个数据点,我想从均值(df.groupby("degree").mean()
的输出((4,40((2,80(和(4,94(获得3个距离,并创建3个新列。距离应通过公式BCA_mean=(name-4)^3+(score-40)^3,M.Tech_mean=(name-2)^3+(score-80)^3,MBA_mean=(name-4)^3+(score-94)^3
计算
import pandas as pd
# dictionary of lists
dict = {'name':[5, 4, 2, 3],
'degree': ["MBA", "BCA", "M.Tech", "MBA"],
'score':[90, 40, 80, 98]}
# creating a dataframe from a dictionary
df = pd.DataFrame(dict)
print (df)
name degree score
0 5 MBA 90
1 4 BCA 40
2 2 M.Tech 80
3 3 MBA 98
df.groupby("degree").mean()
degree name score
BCA 4 40
M.Tech 2 80
MBA 4 94
更新1
我的真实数据集有100多列。我更喜欢能满足这种需要的东西。逻辑仍然是一样的,对于每个平均值,从一列中减去平均值,取每个单元格的立方体并添加
我发现了下面这样的东西。但不确定是否有其他有效的方法
y=df.groupby("degree").mean()
print (y)
import numpy as np
(np.square(df[['name','score']].subtract(y.iloc[0,:],axis=1))).sum(axis=1)
df["mean0"]=(np.square(df[['name','score']].subtract(y.iloc[0,:],axis=1))).sum(axis=1)
df
import pandas as pd
# dictionary of lists
dict = {'degree': ["MBA", "BCA", "M.Tech", "MBA","BCA"],
'name':[5, 4, 2, 3,2],
'score':[90, 40, 80, 98,60],
'game':[100,200,300,100,400],
'money':[100,200,300,100,400],
'loan':[100,200,300,100,400],
'rent':[100,200,300,100,400],
'location':[100,200,300,100,400]}
# creating a dataframe from a dictionary
df = pd.DataFrame(dict)
print (df)
dfx=df.groupby("degree").mean()
print(dfx)
def fun(x):
if x[0]=='BCA':
return x[1:] - dfx.iloc[0,:].tolist()
if x[0]=='M.Tech':
return x[1:]-dfx.iloc[1,:].tolist()
if x[0]=='MBA':
return x[1:]-dfx.iloc[2,:].tolist()
df_added=df.apply(fun,axis=1)
df_added
结果
degree name score game money loan rent location
0 MBA 5 90 100 100 100 100 100
1 BCA 4 40 200 200 200 200 200
2 M.Tech 2 80 300 300 300 300 300
3 MBA 3 98 100 100 100 100 100
4 BCA 2 60 400 400 400 400 400
``````
mean which is dfx
``````````
name score game money loan rent location
degree
BCA 3 50 300 300 300 300 300
M.Tech 2 80 300 300 300 300 300
MBA 4 94 100 100 100 100 100
````````````
df_added********
difference of each element from their mean column value
``````````
name score game money loan rent location
0 1 -4 0 0 0 0 0
1 1 -10 -100 -100 -100 -100 -100
2 0 0 0 0 0 0 0
3 -1 4 0 0 0 0 0
4 -1 10 100 100 100 100 100