基于列中唯一值的数据帧的增量计数器



现有数据帧:

Unique_Id          sms_sent_date         
A                    11-01-2022                   
A                    11-01-2022                   
A                    12-01-2022                   
A                    13-01-2022                   
B                    02-02-2022                   
B                    03-02-2022                   
B                    06-02-2022                   
B                    06-02-2022                   

预期数据帧:

Unique_Id          sms_sent_date         unique_sms_counter
A                    11-01-2022                   1
A                    11-01-2022                   1
A                    12-01-2022                   2
A                    13-01-2022                   3
B                    02-02-2022                   1
B                    03-02-2022                   2
B                    06-02-2022                   3
B                    06-02-2022                   3

我试图添加smsq_counter,它将计算发送到唯一Id的唯一sms。尝试了这个df.groupby(['Unique_Id','sms_sent_date']).cumcount(),但坚持将其应用于整个数据帧

您可以使用:

df['unique_sms_counter'] = (
# get non duplicates
(~df.duplicated(['Unique_Id', 'sms_sent_date']))
# increment per group
.groupby(df['Unique_Id']).cumsum()
)

输出:

Unique_Id sms_sent_date  unique_sms_counter
0         A    11-01-2022                   1
1         A    11-01-2022                   1
2         A    12-01-2022                   2
3         A    13-01-2022                   3
4         B    02-02-2022                   1
5         B    03-02-2022                   2
6         B    06-02-2022                   3
7         B    06-02-2022                   3

您可以在groupby:上使用factorize

df['unique_sms_counter'] = df.groupby(['Unique_Id'])['sms_sent_date'].transform(lambda x: x.factorize()[0]+1)

或者,如果您的日期是实际日期时间类型,您可以使用rank:

df['sms_sent_date'] = pd.to_datetime(df['sms_sent_date'], dayfirst=True)
df.groupby('Unique_Id')['sms_sent_date'].rank('dense').astype(int)

输出:

Unique_Id sms_sent_date  unique_sms_counter
0         A    11-01-2022                   1
1         A    11-01-2022                   1
2         A    12-01-2022                   2
3         A    13-01-2022                   3
4         B    02-02-2022                   1
5         B    03-02-2022                   2
6         B    06-02-2022                   3
7         B    06-02-2022                   3

最新更新