我有下面的数据帧df
。每个人都列出了一年或几年的实际数量。
对于每个人,从最近一年开始,当值为0
时,我希望将前几年的值替换为0
,如desired quantity
列所示。
我怎样才能做到这一点?
import pandas as pd
d = {'person': ['a', 'b', 'b', 'c', 'c', 'c', 'd', 'd', 'd', 'd', 'e'],
'year': [2000, 2000, 2001, 2000, 2001, 2002, 2000, 2001, 2002, 2003, 2003],
'actual quantity': [2, 0, 1, 3, 0, 1, 4, 3, 0, 1, 1],
'desired quantity': [2, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1]}
df = pd.DataFrame(data=d)
df
+--------+------+-----------------+------------------+
| person | year | actual quantity | desired quantity |
+--------+------+-----------------+------------------+
| a | 2000 | 2 | 2 |
| b | 2000 | 0 | 0 |
| b | 2001 | 1 | 1 |
| c | 2000 | 3 | 0 |
| c | 2001 | 0 | 0 |
| c | 2002 | 1 | 1 |
| d | 2000 | 4 | 0 |
| d | 2001 | 3 | 0 |
| d | 2002 | 0 | 0 |
| d | 2003 | 1 | 1 |
| e | 2003 | 1 | 1 |
+--------+------+-----------------+------------------+
我认为您首先需要不同的每列DataFrame.sort_values
,然后使用带有自定义掩码的Series.mask
-通过0
比较值,通过Series.view
转换为整数,通过GroupBy.cumsum
创建每组总和,最后比较不等于0
:
df = df.sort_values(['person','year'], ascending=[True, False])
m = (df['actual quantity'].eq(0)
.view('i1')
.groupby(df['person'])
.cumsum()
.ne(0))
df['desired quantity1'] = df['actual quantity'].mask(m, 0)
df = df.sort_index()
print (df)
person year actual quantity desired quantity desired quantity1
0 a 2000 2 2 2
1 b 2000 0 0 0
2 b 2001 1 1 1
3 c 2000 3 0 0
4 c 2001 0 0 0
5 c 2002 1 1 1
6 d 2000 4 0 0
7 d 2001 3 0 0
8 d 2002 0 0 0
9 d 2003 1 1 1
10 e 2003 1 1 1