如何按分位数分组(不只是计算分位数)

例如，我有一个列[1,2,3,4,5,6,7,8,9,10]，分位数为[0, 0.33, 0.66, 1](长度不固定)，则df应分为3组(组名不关心)

使用for循环是唯一的方法吗?

您可以使用Series.quantile,pd.cut和groupby的组合来做您正在寻找的。

In [1]: import pandas as pd, numpy as np
In [2]: s = pd.Series([1,2,3,4,5,6,7,8,9,10])

使用分位数查找切点:

In [3]: qs = s.quantile([0, 0.33, 0.66, 1])

现在您可以使用cut将每个元素分配给一个bin，使用分位数作为您的bin边缘:

In [8]: pd.cut(s, bins=qs, include_lowest=True)
Out[8]:
0    (0.999, 3.97]
1    (0.999, 3.97]
2    (0.999, 3.97]
3     (3.97, 6.94]
4     (3.97, 6.94]
5     (3.97, 6.94]
6     (6.94, 10.0]
7     (6.94, 10.0]
8     (6.94, 10.0]
9     (6.94, 10.0]
dtype: category
Categories (3, interval[float64, right]): [(0.999, 3.97] < (3.97, 6.94] < (6.94, 10.0]]

可以在groupby操作中直接使用cut的结果，例如groupby.mean:

In [9]: s.groupby(pd.cut(s, bins=qs, include_lowest=True)).mean()
Out[9]:
(0.999, 3.97]    2.0
(3.97, 6.94]     5.0
(6.94, 10.0]     8.5
dtype: float64

相关内容

最新更新

热门标签：