我有一个数据帧，如下所示。我理解df.groupby("degree").mean()将通过列degree为我提供平均值。我想采用这些方法，找出每个数据点和这些平均值之间的距离。在这种情况下。对于每个数据点，我想从均值(df.groupby("degree").mean()的输出((4,40((2,80(和(4,94(获得3个距离，并创建3个新列。距离应通过公式BCA_mean=(name-4)^3+(score-40)^3,M.Tech_mean=(name-2)^3+(score-80)^3,MBA_mean=(name-4)^3+(score-94)^3计算

import pandas as pd 
# dictionary of lists 
dict = {'name':[5, 4, 2, 3], 
'degree': ["MBA", "BCA", "M.Tech", "MBA"], 
'score':[90, 40, 80, 98]} 
# creating a dataframe from a dictionary  
df = pd.DataFrame(dict) 
print (df)
name  degree  score
0     5     MBA     90
1     4     BCA     40
2     2  M.Tech     80
3     3     MBA     98

df.groupby("degree").mean()    
degree name score       
BCA     4   40
M.Tech  2   80
MBA     4   94

更新1

我的真实数据集有100多列。我更喜欢能满足这种需要的东西。逻辑仍然是一样的，对于每个平均值，从一列中减去平均值，取每个单元格的立方体并添加

我发现了下面这样的东西。但不确定是否有其他有效的方法

y=df.groupby("degree").mean()
print (y)
import numpy as np
(np.square(df[['name','score']].subtract(y.iloc[0,:],axis=1))).sum(axis=1)
df["mean0"]=(np.square(df[['name','score']].subtract(y.iloc[0,:],axis=1))).sum(axis=1)
df

import pandas as pd 
# dictionary of lists 
dict = {'degree': ["MBA", "BCA", "M.Tech", "MBA","BCA"], 
'name':[5, 4, 2, 3,2], 
'score':[90, 40, 80, 98,60],
'game':[100,200,300,100,400],
'money':[100,200,300,100,400],
'loan':[100,200,300,100,400],
'rent':[100,200,300,100,400],
'location':[100,200,300,100,400]} 
# creating a dataframe from a dictionary  
df = pd.DataFrame(dict) 
print (df)
dfx=df.groupby("degree").mean()
print(dfx)

def fun(x):
if x[0]=='BCA':
return x[1:] - dfx.iloc[0,:].tolist()
if x[0]=='M.Tech': 
return x[1:]-dfx.iloc[1,:].tolist()
if x[0]=='MBA':
return x[1:]-dfx.iloc[2,:].tolist()

df_added=df.apply(fun,axis=1)
df_added

结果

degree  name  score  game  money  loan  rent  location
0     MBA     5     90   100    100   100   100       100
1     BCA     4     40   200    200   200   200       200
2  M.Tech     2     80   300    300   300   300       300
3     MBA     3     98   100    100   100   100       100
4     BCA     2     60   400    400   400   400       400
``````
mean  which is dfx
``````````
name  score  game  money  loan  rent  location
degree                                                
BCA        3     50   300    300   300   300       300
M.Tech     2     80   300    300   300   300       300
MBA        4     94   100    100   100   100       100
````````````
df_added********  
difference of each element from their mean column value
``````````
name    score   game    money   loan    rent    location
0       1      -4       0       0       0       0    0
1       1     -10    -100     -100   -100    -100    -100
2       0       0       0       0       0       0    0
3      -1       4       0       0       0       0    0
4      -1      10      100     100    100     100    100

Python通过一列查找所有行的平均值，然后查找距离

更新1

相关内容

最新更新

热门标签：