数据帧添加按列条件计数的列



我需要从excel文件向当前数据帧添加列,该列计算数据帧=='HIT'或"HITTOP'。我有两列数据框架(型号,HK(。HK列包含HIT或HITTOP字符串。下面是代码,我做了一个计数器,但只有当模型在HK列上有非空字符串时,它才会计数。Dataframe有很多文件中的模型,所以它有重复的,这就是为什么我需要特定条件下的计数器。

import pandas as pd
df = pd.read_excel(r'C:UsersuserDesktoptestoutput.xlsx')
df['count'] = df.groupby('Model')['HK'].transform('count') #add the count column that counts non empty strings from HK column
df.to_excel(r'C:UsersuserDesktoptestoutput3.xlsx') #save the output

样本数据:

d = {'Model': ['model1', 'model2',' model1', 'model1', 'model2'], 'HK': ['HITTOP', 'HIT', "HITTOP", '', '']}
df = pd.DataFrame(data=d)
df
Model   HK
0   model1  HITTOP
1   model2  HIT
2   model1  HITTOP
3   model1  
4   model2  

期望输出:

f = {'Model': ['model1', 'model2',' model1', 'model1', 'model2'], 'HK': ['HITTOP', 'HIT', "HITTOP", '', ''],
'Count': ['2', '1', "2", '2', '1']}
df = pd.DataFrame(data=f)
df
Model   HK    Count
0   model1  HITTOP  2
1   model2  HIT     1
2   model1  HITTOP  2
3   model1          2
4   model2      
df = df.fillna('')
df2 = df.groupby('HK').apply
(lambda x: x.shape[0]).rename('Count').reset_index()
df = df.merge(df2, how='left')
Model      HK  Count
0  model1  HITTOP      2
1  model2     HIT      1
2  model1  HITTOP      2
3  model1              2
4  model2              2

最新更新