巨蟒熊猫在列中找到一组的百分位数



我想找到每列的百分比,并添加到df数据帧中,同时标记

如果列的值是

前20%(值>第80百分位(然后是"强">

低于20%(值>80%(,则"弱">

其他平均

下面是我的数据帧

  df=pd.DataFrame({'month':['1','1','1','1','1','2','2','2','2','2','2','2'],'X1': 
  [30,42,25,32,12,10,4,6,5,10,24,21],'X2':[10,76,100,23,65,94,67,24,67,54,87,81],'X3': 
  [23,78,95,52,60,76,68,92,34,76,34,12]})
  df

下面是我尝试的

  df['X1_percentile'] = df.X1.rank(pct = True)
  df['X1_segment'] = np.where(df['X1_percentile']>0.8, 'Strong',np.where(df['X1_percentile'] 
  <0.20,'Weak', 'Average')) 

但我想为每个月和每个专栏做这件事。如果可能的话,这可以由用于任何col编号的函数自动执行,并且还可以键入colname+"colname"_每个";和colname+"_段";对于每列?感谢

我们可以使用groupby+rank和可选参数pct=True来计算表示为百分位等级的排名,然后使用np.select bin/将百分位值分类为离散标签。

p = df.groupby('month').rank(pct=True)
df[p.columns + '_per'] = p
df[p.columns + '_seg'] = np.select([p.gt(.8), p.lt(.2)], ['strong', 'weak'], 'average')

   month  X1   X2  X3    X1_per    X2_per    X3_per   X1_seg   X2_seg   X3_seg
0      1  30   10  23  0.600000  0.200000  0.200000  average  average  average
1      1  42   76  78  1.000000  0.800000  0.800000   strong  average  average
2      1  25  100  95  0.400000  1.000000  1.000000  average   strong   strong
3      1  32   23  52  0.800000  0.400000  0.400000  average  average  average
4      1  12   65  60  0.200000  0.600000  0.600000  average  average  average
5      2  10   94  76  0.642857  1.000000  0.785714  average   strong  average
6      2   4   67  68  0.142857  0.500000  0.571429     weak  average  average
7      2   6   24  92  0.428571  0.142857  1.000000  average     weak   strong
8      2   5   67  34  0.285714  0.500000  0.357143  average  average  average
9      2  10   54  76  0.642857  0.285714  0.785714  average  average  average
10     2  24   87  34  1.000000  0.857143  0.357143   strong   strong  average
11     2  21   81  12  0.857143  0.714286  0.142857   strong  average     weak

最新更新