如何插值以便我可以将 20 项数组"拉伸"为 30 项数组，但保持总数和百分位数相同？

如果我有这个 numpy 数组：

x = np.array([10,20])

我想通过将其大小增加一倍来"拉伸"它，我可以很容易地做到这一点

y = np.repeat(x,2)/2

并得到

[5,5,10,10]

但是，如果我想拉伸它，添加的不是 2 个，而是任意数量的句点怎么办？就好像我想在上面的公式中重复一个非整数一样。

这样做的背景是，我有一个数组来测量一段时间内的某些东西 - 例如，数组的每个元素都是一个周期内覆盖的距离。

我需要"拉伸"数组，例如计算一个新数组，其中覆盖相同的距离，例如 30 个周期而不是 20 个周期。我需要百分位数相同，以便覆盖的总距离相同，第一个数组中前 10 个元素的总和 = 新数组中前 15 个元素的总和，依此类推。

线性插值很好。

我和scipy.interpolate一起炮制了一些东西，但它似乎有点复杂，我想知道是否有更好的方法。步骤如下：

我从我的数组 y 开始，我将 x 设置为相应的 % 索引，因此第 1 项为 1/长度，最后一项 = 100%
我在以 x 和 y 开头 OX (以避免在范围误差下方插值(
我插值累计总和
我使用用于计算"拉伸"数组的插值函数

代码是这样的。编辑：我已经研究了@eliadl的答案。它与我的非常接近，但不是 100% 相同。我不清楚是什么导致了差异 - 欢迎任何见解！

我将下面的代码放在一起以显示差异。我的代码实现了我的想法：如果原始数组有 4 个项目和新的 10 个项目，并且第二个项目的 CDF(累积分布函数(= 40%，那么新数组的第 5 项的 CDF 必须 = 40%，依此类推。

import numpy as np
import pandas as pd
import scipy
import matplotlib.pyplot as plt
import matplotlib
import seaborn
import seaborn as sns
from matplotlib.ticker import FuncFormatter
sns.set(style='darkgrid')
def my_stretch(inp, s):
y  = inp
x = np.arange(1, len(y) +1  ) / len(y)
y_2 = np.hstack([0,y])
x_2 = np.hstack([0,x])
f_int = scipy.interpolate.interp1d(x_2 , np.cumsum(y_2) )
x_new = np.arange(0, len(y) + s + 1 ) / ( len(y) + s )
y_new_cum = f_int(x_new)
y_new = np.diff(y_new_cum)
return y_new
def your_stretch(inp,s):
x = np.arange(y.size)
x_stretch = np.linspace(x[0], x[-1], num = x.size + s )
y_stretch = np.interp(x_stretch, x, y)
y_stretch *= y.sum() / y_stretch.sum()
return y_stretch
def cdf(x):
return np.cumsum(x) / x.sum()
y = np.array([20,10,8,6,4,2])
s = 3
my_s = my_stretch(y,s)
your_s = your_stretch(y,s)
cdf_orig = cdf(y)
cdf_my = cdf(my_s)
cdf_your = cdf(your_s)
fig, ax = plt.subplots(2,1)
sns.lineplot( np.arange(1,len(my_s) + 1 ) / len(my_s) , cdf_my, label = 'mine', marker='o', ax = ax[0] )
sns.lineplot( np.arange(1,len(your_s) + 1)/len(your_s) , cdf_your, label = 'yours', marker ='o', ax = ax[0] )
sns.lineplot( np.arange(1,len(y) + 1 ) / len(y) , cdf_orig, label = 'original', ax = ax[0] )
ax[1].plot( my_s , label = 'mine' , marker='o' )
ax[1].plot(your_s, label = 'yours',  marker='o')
ax[0].set_xlabel('% position (the last item in the array = 1 ↑')
ax[0].set_ylabel('cumulative distribution function')
ax[1].set_xlabel('item in the array ↑')
ax[1].set_ylabel('value')
ax[1].legend()

使用np.interp和np.linspace：

y = np.array([20, 10, 8, 6, 4, 2])
stretch_by = 1.5
x = np.arange(y.size)  # [0, 1, 2, 3, 4, 5]
x_stretch = np.linspace(
start=x[0], stop=x[-1], num=x.size * stretch_by,
)  # [0, 0.625, 1.25, 1.875, 2.5, 3.125, 3.75, 4.375, 5]
y_stretch = np.interp(x_stretch, x, y)  # [20, 13.75, 9.5, 8.25, 7, 5.75, 4.5, 3.25, 2]
y_stretch *= y.sum() / y_stretch.sum()  # normalize y_stretch.sum() to y.sum()
print(f"{y}'s sum is {y.sum()}n")
print(f"{y_stretch}'s sum is {y_stretch.sum()}")

输出：

[20 10  8  6  4  2]'s sum is 50
[13.51351351  9.29054054  6.41891892  5.57432432  4.72972973  3.88513514
3.04054054  2.19594595  1.35135135]'s sum is 50.0

相关内容

最新更新

热门标签：