在"日期"列上使用groupby将列添加到数据帧



现有数据帧:

Sr.No        Date         Tag    score
01         10-02-2022    pass     10
02         10-02-2022    fail     5
03         10-02-2022    pass     10
04         11-02-2022    grace    3
05         11-02-2022    pass     15
06         11-02-2022    pass     15

预期数据帧:

Sr.No   Date       Tag  score no_of_records pass_count fail_count grace_count pass_score_total 
01    10-02-2022  pass   10         3           2          1          0             20
02    10-02-2022  fail   5          
03    10-02-2022  pass   10
04    11-02-2022  grace  3          3           2          0          1             30
05    11-02-2022  pass   15
06    11-02-2022  pass   15

no_of_records表示特定日期的记录数。如何在Date上进行groupby以获得所需的输出。。?

首先通过聚合和获取s系列中pass的和,然后通过Series.value_counts获取计数,并与crosstab:连接

s = df['score'].where(df['Tag'].eq('pass'), 0).groupby(df['Date']).sum()
.rename('pass_score_total'))
s1 = df['Date'].value_counts().rename('no_of_records')
df1 = pd.concat([s1, pd.crosstab(df['Date'], df['Tag']).add_suffix('_count'), s], axis=1)
df = df.join(df1, on='Date')
print (df)
Sr.No        Date    Tag  score  no_of_records  fail_count  grace_count  
0      1  10-02-2022   pass     10              3           1            0   
1      2  10-02-2022   fail      5              3           1            0   
2      3  10-02-2022   pass     10              3           1            0   
3      4  11-02-2022  grace      3              3           0            1   
4      5  11-02-2022   pass     15              3           0            1   
5      6  11-02-2022   pass     15              3           0            1   
pass_count  pass_score_total  
0           2                20  
1           2                20  
2           2                20  
3           2                30  
4           2                30  
5           2                30  

如果需要空字符串:

s = df['score'].where(df['Tag'].eq('pass'), 0).groupby(df['Date']).sum().rename('pass_score_total')
s1 = df['Date'].value_counts().rename('no_of_records')
df1 = pd.concat([s1, pd.crosstab(df['Date'], df['Tag']).add_suffix('_count'), s], axis=1)
df = df.join(df1, on='Date')
df[df1.columns] = df[df1.columns].mask(df['Date'].duplicated(), '', axis=1)
print (df)
Sr.No        Date    Tag  score no_of_records fail_count grace_count  
0      1  10-02-2022   pass     10             3          1           0   
1      2  10-02-2022   fail      5                                        
2      3  10-02-2022   pass     10                                        
3      4  11-02-2022  grace      3             3          0           1   
4      5  11-02-2022   pass     15                                        
5      6  11-02-2022   pass     15                                        
pass_count pass_score_total  
0          2               20  
1                              
2                              
3          2               30  
4                              
5                              

您可以使用pd.crosstab:

>>> (pd.crosstab(df['Date'], df['Tag']).add_suffix('_count')
.assign(no_of_records=lambda x: x.sum(axis=1)).reset_index()
.rename_axis(columns=None))
Date  fail_count  grace_count  pass_count  no_of_records
0  10-02-2022           1            0           2              3
1  11-02-2022           0            1           2              3

最新更新