如何在熊猫DF中的两个指数之间返回值的均匀分布

您如何有效地返回 nan's，均匀分布在索引之间？我已经通过切片手动完成了此操作，但是当您有1000次打电话时，这会变得非常低效。

这个问题可能比文本更容易通过所需的输入/输出理解。

下面显示了一个示例df，其中包含随机nan's的整个过程：

df = pd.DataFrame(np.random.randn(10, 2), 
                  index=[1,2,3,4,5,6,7,8,9,10],
                  columns=['one', 'two'])
df = df.mask(np.random.random(df.shape) < .5)
ix = [(row, col) for row in range(df.shape[0]) for col in range(df.shape[1])]
for row, col in random.sample(ix, int(round(.1*len(ix)))):
df.iat[row, col] = np.nan

虽然函数是随机的，但会产生类似的内容：

         one       two
1        NaN       NaN
2   0.823711 -1.581639
3        NaN -1.632728
4   2.267315 -1.213950
5        NaN -0.779525
6        NaN       NaN
7        NaN -1.817710
8   0.190799       NaN
9        NaN       NaN
10       NaN       NaN

如果我们考虑column one，我想将值插入行3,5,6,7,9。我可以通过切片和行手动执行此操作。因此，如果我想找到索引3，我会添加2.267315和0.823711，然后除以3。这很容易，因为它只是平均值。将等于1.545513。但是，我有一些nan分布在多个指数上，例如5,6,7。如果我想找到5,6,7，我将减去2.267315和0.19079，然后除以4。

因此，预期的输出将是：

             one       two
    1        NaN       NaN
    2   0.823711 -1.581639
    3   1.545513 -1.632728
    4   2.267315 -1.213950
    5   1.748247 -0.779525
    6   1.229057 -1.298525
    7   0.709928 -1.817710
    8   0.190799       NaN
    9        NaN       NaN
    10       NaN       NaN

我开始通过在每个适当的行之间切片来手动执行此操作。除此之外，我考虑了一个循环，但是每个计算都会有所不同，因为NAN的随机分布在整个数据集中。它们也波动为比以前的数字大或更小。

使用interpolate使用mask：

省略 NaN s

df = df.mask(df.bfill().notnull(), df.interpolate())
print (df)
         one       two
1        NaN       NaN
2   0.823711 -1.581639
3   1.545513 -1.632728
4   2.267315 -1.213950
5   1.748186 -0.779525
6   1.229057 -1.298617
7   0.709928 -1.817710
8   0.190799       NaN
9        NaN       NaN
10       NaN       NaN

详细信息：

print (df.interpolate())
         one       two
1        NaN       NaN
2   0.823711 -1.581639
3   1.545513 -1.632728
4   2.267315 -1.213950
5   1.748186 -0.779525
6   1.229057 -1.298617
7   0.709928 -1.817710
8   0.190799 -1.817710
9   0.190799 -1.817710
10  0.190799 -1.817710

相关内容

最新更新

热门标签：