根据分组输入计算百分位数



嗨,目前我想尝试创建一个函数来计算基于分组输入的百分位数,说明我从直方图中获得这些数据集。

given:  
hist = [10, 15, 4]   
edges = [0.5, 6, 12, 25]  
perc = 5

我想返回基于百分比开始的百分位数,因此返回值将是这样的

perc = 5
return percentile(data,0),
percentile(data,.25),
percentile(data,50),
percentile(data,75),
percentile(data,100)

输出:[0.5,4.4875,7.8,10.7,25]

我曾尝试使用pandas.qcut(data,perc),但似乎切割不正确

如果我理解正确的话,这应该可以工作:

def percentile_binning(hist, edges, percentages):
hist_cumsum = np.cumsum(hist)
hist_sum = hist_cumsum[-1]
hist_cumsum_norm = hist_cumsum / hist_sum
indxs = np.digitize(percentages, hist_cumsum_norm)
bins_reduction = np.append(np.array([0]), hist_cumsum)[indxs]
vals_between_edged = percentages * hist_sum - bins_reduction
edged_diff = edges[1:] - edges[:-1]
edged_diff = np.append(edged_diff, 0)
percentage_diff = edged_diff[indxs]
percentage_edge_value = edges[indxs]
percentage_bins_sizes = np.append(hist, hist_sum)[indxs]
result = percentage_diff * vals_between_edged / percentage_bins_sizes + percentage_edge_value
return result

输入:

hist = np.array([10, 15, 4])
edges = np.array([0.5, 6, 12, 25])
percentages = np.array([0, 0.25, 0.5, 0.75, 1])
print(percentile_binning(hist, edges, percentages))

输出:

[ 0.5     4.4875  7.8    10.7    25.    ]