我几天来一直在努力让我的程序更上一层楼。
我想对我所有的小组进行分组和计数。
日期 | 泵 | 组 | |
---|---|---|---|
2018-01-06 12:01:00 | false | 0||
2018-01-06 12:01:30 | 真 | <1>||
2018-01-06 12:02:00 | 真 | 1||
2018-01-06 12:02:30 | 错误 | 0 | |
2018-01-06 12:03:00 | 真 | 2||
2018-01-06 12:03:30 | 真 | 2[/tr>||
2018-01-06 12:04:00 | 真 | 2 | |
2018-01-06 12:04:30 | 错误 | 0//tr>
您可以创建由第一个True
开始的组,由Series.shift
按链移位掩码创建组,并由Series.cumsum
累积和,最后为Series.where
:中的False
值设置0
df['group1'] = ((df['pump'] & ~df['pump'].shift(fill_value=False)).cumsum()
.where(df['pump'], 0))
print (df)
date pump group group1
0 2018-01-06 12:01:00 False 0 0
1 2018-01-06 12:01:30 True 1 1
2 2018-01-06 12:02:00 True 1 1
3 2018-01-06 12:02:30 False 0 0
4 2018-01-06 12:03:00 True 2 2
5 2018-01-06 12:03:30 True 2 2
6 2018-01-06 12:04:00 True 2 2
7 2018-01-06 12:04:30 False 0 0
为什么第一个解决方案错误:(似乎在样品中有效(
df['group1'] = (~df['pump']).cumsum().where(df['pump'], 0)
print (df)
date pump group group1
0 2018-01-06 12:01:00 False 0 0
1 2018-01-06 12:01:30 True 1 1
2 2018-01-06 12:02:00 True 1 1
3 2018-01-06 12:02:30 False 0 0
4 2018-01-06 12:03:00 True 2 2
5 2018-01-06 12:03:30 True 2 2
6 2018-01-06 12:04:00 True 2 2
7 2018-01-06 12:04:30 False 0 0
但如果更改数据:
df['group1'] = (~df['pump']).cumsum().where(df['pump'], 0)
print (df)
date pump group group1
0 2018-01-06 12:01:00 False 0 0
1 2018-01-06 12:01:30 False 0 0 <- changed false
2 2018-01-06 12:02:00 True 1 2 <- starting by 2
3 2018-01-06 12:02:30 False 0 0
4 2018-01-06 12:03:00 True 2 3
5 2018-01-06 12:03:30 True 2 3
6 2018-01-06 12:04:00 True 2 3
7 2018-01-06 12:04:30 False 0 0
df['group1'] = (~df['pump']).cumsum().where(df['pump'], 0)
print (df)
date pump group group1
0 2018-01-06 12:01:00 True 1 0 <- changed false, starting true group by 0
1 2018-01-06 12:01:30 True 1 0
2 2018-01-06 12:02:00 True 1 0
3 2018-01-06 12:02:30 False 0 0
4 2018-01-06 12:03:00 True 2 1
5 2018-01-06 12:03:30 True 2 1
6 2018-01-06 12:04:00 True 2 1
7 2018-01-06 12:04:30 False 0 0