如何根据日期时间从非连续的数据框中删除特定人员的所有值

date      consumption  customer_id
2018-01-01     12             111
2018-01-02     12             111
*2018-01-03*   14             111   
*2018-01-05*   12             111
2018-01-06     45             111
2018-01-07     34             111 
2018-01-01     23             112 
2018-01-02     23             112
2018-01-03     45             112
2018-01-04     34             112
2018-01-05     23             112
2018-01-06     34             112
2018-01-01     23             113
2018-01-02     34             113
2018-01-03     45             113
2018-01-04     34             113

customer 111中的值不连续，在2018-01-04有缺失值，所以我想删除所有111从我的数据框架在熊猫。

date      consumption  customer_id
2018-01-01     23             112 
2018-01-02     23             112
2018-01-03     45             112
2018-01-04     34             112
2018-01-05     23             112
2018-01-06     34             112
2018-01-01     23             113
2018-01-02     34             113
2018-01-03     45             113
2018-01-04     34             113

我想要这样的结果?这怎么可能发生在熊猫身上?

您可以计算连续的delta并检查是否有大于1d的:

drop = (pd.to_datetime(df['date'])
.groupby(df['customer_id'])
.apply(lambda s: s.diff().gt('1d').any())
)
out = df[df['customer_id'].isin(drop[~drop].index)]

或与groupby.filter:

df['date'] = pd.to_datetime(df['date'])
out = (df.groupby(df['customer_id'])
.filter(lambda d: ~d['date'].diff().gt('1d').any())
)

输出:

date  consumption  customer_id
6   2018-01-01           23          112
7   2018-01-02           23          112
8   2018-01-03           45          112
9   2018-01-04           34          112
10  2018-01-05           23          112
11  2018-01-06           34          112
12  2018-01-01           23          113
13  2018-01-02           34          113
14  2018-01-03           45          113
15  2018-01-04           34          113

如果你的日期不一定是增加的，也检查你不能回到过去:

df['date'] = pd.to_datetime(df['date'])
out = (df.groupby(df['customer_id'])
.filter(lambda d: d['date'].diff().iloc[1:].eq('1d').all())
)

相关内容

最新更新

热门标签：