原始数据帧
a | b | yyyymm价格 | ||
---|---|---|---|---|
1 | a | 200101 | 3000 | |
1 | a | 200102 | np.nan | |
1 | a | 200103 | np.nan | |
1 | a | 200104 | 6000 | |
1 | b | 200101 | np.nan | |
1 | b | 200102 | np.nan | |
1 | b | 200103 | np.nan | |
1 | b | 200104 | 3000 | |
2 | a | 200101 | 3000 | |
2 | a | 200102 | np.nan | |
2 | a | 200103 | np.nan | |
2 | a | 200104 | np.nan |
这符合预期:
df = pd.DataFrame({'a': [1,1,1,1,1,1,1,1,2,2,2,2],
'b': list('aaaabbbbaaaa'),
'yyyymm': [200101, 200102, 200103, 200104, 200101, 200102, 200103, 200104,
200101, 200102, 200103, 200104],
'price': [3000,np.NaN,np.NaN,6000,np.NaN,np.NaN,np.NaN,3000,3000,np.NaN,np.NaN,np.NaN]
})
df.groupby(['a', 'b'])['price'].apply(
lambda group: group.interpolate(method='linear', limit=2, limit_area='inside')
)
输出:
0 3000.0
1 4000.0
2 5000.0
3 6000.0
4 NaN
5 NaN
6 NaN
7 3000.0
8 3000.0
9 NaN
10 NaN
11 NaN
Name: price, dtype: float64