将时间序列数据的负计数替换为正计数的策略



数据列包含新增客户的值。然而,它包含负值,这意味着在特定的一周内没有留下客户。

用正值替换负值的最佳方法是什么?

例如,以前值的平均值

这是的样本

Week    New customer added
Week 01 1,881 
Week 02 7 
Week 03 10 
Week 04 10 
Week 05 (22)
Week 06 (18)
Week 07 22 
Week 08 25 
Week 09 3 
Week 10 (36)
Week 11 25 
Week 12 20 
Week 13 (7)
Week 14 (24)
Week 15 10 
Week 16 29 
Week 17 4 
Week 18 (55)
Week 19 (3)
Week 20 17 
Week 21 (1,875)
Week 22 1,867 
Week 23 (13)
Week 24 47 
Week 25 40 
Week 26 (1,916)
Week 27 1,891 
Week 28 0 
Week 29 22 

目前的策略是

df['New customer added'].mask(df['New customer added'].lt(0)).bfill().fillna(0) 

我正在考虑第二种策略,即用之前2-3个正值的平均值代替负数。

第二种策略的预期输出是

Week    New customer added
Week 01 1,881 
Week 02 7 
Week 03 10 
Week 04 10 
Week 05 9  # Negative no. replaced by mean of prev 3(i.e. 7,10,10) positive no. i.e. 9
Week 06 9  # Negative no. replaced by mean of prev 3(i.e. 7,10,10) positive no. i.e. 9

使用:

#is sample data are ,, so replace to emoty string
df['New customer added'] = df['New customer added'].astype(str).str.replace(',','')
#negatives are in sample with (
m = df['New customer added'].astype(str).str.contains('(')

解决方案只过滤正行,并使用mean滚动,最后通过原始索引使用DataFrame.reindex,并向前填充缺失值(对于负行(

new = (df.loc[~m, 'New customer added'].astype(float)
.rolling(3, min_periods=1).mean()
.reindex(df.index, method='ffill'))

最后只替换底片:

df['New customer added'] = df['New customer added'].mask(m, new)

print (df)
Week New customer added
0   Week 01               1881
1   Week 02                  7
2   Week 03                 10
3   Week 04                 10
4   Week 05                9.0
5   Week 06                9.0
6   Week 07                 22
7   Week 08                 25
8   Week 09                  3
9   Week 10          16.666667
10  Week 11                 25
11  Week 12                 20
12  Week 13               16.0
13  Week 14               16.0
14  Week 15                 10
15  Week 16                 29
16  Week 17                  4
17  Week 18          14.333333
18  Week 19          14.333333
19  Week 20                 17
20  Week 21          16.666667
21  Week 22               1867
22  Week 23         629.333333
23  Week 24                 47
24  Week 25                 40
25  Week 26         651.333333
26  Week 27               1891
27  Week 28                  0
28  Week 29                 22

最新更新