如何逐行应用numpy.where()或fillna()从新填充的行返回元素



我试图根据以前的行和不同的列填充NaN行。我有以下代码:

import pandas as pd
import numpy as np
data = {'value':[55,58,60,62,64,np.nan,np.nan],
'growth_rate': [np.nan,1.0545,1.034483,1.033333,1.032258,1.02,1.03]}
df = pd.DataFrame(data)  
print(df) 

给出如下数据帧:

value  growth_rate
0   55.0          NaN
1   58.0     1.054500
2   60.0     1.034483
3   62.0     1.033333
4   64.0     1.032258
5    NaN     1.020000
6    NaN     1.030000

我确实有增长率来填补第5行和第6行的空白。我尝试了以下代码:

df['value'] = np.where(df['value'].isnull(), df['value'].shift(1) * df['growth_rate'], df['value'])
print(df) 

输出如下:

value  growth_rate
0  55.00          NaN
1  58.00     1.054500
2  60.00     1.034483
3  62.00     1.033333
4  64.00     1.032258
5  65.28     1.020000
6    NaN     1.030000

可以看到,只有第5行使用np.where()填充。我必须重新运行这一行才能得到预期的结果:

value  growth_rate
0  55.0000          NaN
1  58.0000     1.054500
2  60.0000     1.034483
3  62.0000     1.033333
4  64.0000     1.032258
5  65.2800     1.020000
6  67.2384     1.030000

然而,这种方法效率不高。一定有办法在一行中完成这个操作!我也尝试过fillna(),但我得到了相同的结果:

df['value'] = df['value'].fillna(df['value'].shift(1) * df['growth_rate'])
print(df) 
value  growth_rate
0  55.00          NaN
1  58.00     1.054500
2  60.00     1.034483
3  62.00     1.033333
4  64.00     1.032258
5  65.28     1.020000
6    NaN     1.030000

我希望我能找到某种ffill()np.where(),同时根据新填充的行和另一列(growth_rate)填充空白,全部在一个步骤中。

假设所有缺失的值都在一个组中,我们可以将缺失的值在value中进行ffill以得到最后一个有效值,然后取growth_rate的累积积(cumprod)其中valueisna:

m = df['value'].isna()
df.loc[m, 'value'] = df['value'].ffill() * df.loc[m, 'growth_rate'].cumprod()

df:

value  growth_rate
0  55.0000          NaN
1  58.0000     1.054500
2  60.0000     1.034483
3  62.0000     1.033333
4  64.0000     1.032258
5  65.2800     1.020000
6  67.2384     1.030000

设置和导入:

import numpy as np
import pandas as pd
df = pd.DataFrame({
'value': [55.0, 58.0, 60.0, 62.0, 64.0, np.nan, np.nan],
'growth_rate': [np.nan, 1.0545, 1.034483, 1.033333, 1.032258, 1.02, 1.03]
})

假设我们希望单独散布nan组独立计算我们可以创建组和cumsum改用groupby cumprod:

m = df['value'].isna()
df.loc[m, 'value'] = (
df['value'].ffill() *
df.loc[m, 'growth_rate'].groupby((~m).cumsum()).cumprod()
)

df:

value  growth_rate
0  55.000000          NaN
1  58.000000     1.054500
2  60.000014     1.034483  # (group 1) cumprod 
3  62.000000     1.033333
4  64.000000     1.032258
5  65.280000     1.020000  # (group 2) values same as without groupby
6  67.238400     1.030000  # since these are in a group together

修改设置和导入:

import numpy as np
import pandas as pd
df = pd.DataFrame({
'value': [55.0, 58.0, np.nan, 62.0, 64.0, np.nan, np.nan],
'growth_rate': [np.nan, 1.0545, 1.034483, 1.033333, 1.032258, 1.02, 1.03]
})

modifieddf:

value  growth_rate
0   55.0          NaN
1   58.0     1.054500
2    NaN     1.034483
3   62.0     1.033333
4   64.0     1.032258
5    NaN     1.020000
6    NaN     1.030000

相关内容

  • 没有找到相关文章

最新更新