我有一个类似的数据帧(但要大得多(:
year city_code total_tax
id_inf
9 2002 NaN NaN
9 2003 a 417.0
9 2004 a 950.0
9 2005 NaN NaN
9 2006 NaN NaN
54 2002 b 801.0
54 2003 NaN NaN
54 2004 b 218.0
54 2005 b 886.0
54 2006 b 855.0
我需要用相同类别的"id_inf"填充"city_code",并在"total_tax"列中将NaN替换为零。
第二项任务相当简单:df_balanced['total_tax'] = df_balanced['total_tax'].fillna(0)
但在第一个任务中,有人建议我使用类似的东西:"df_balanced['city_code']=df_balands.groupby(level=0(['city_code'].transform(max("。然而,当我使用这个解决方案时,我会遇到这样的错误:TypeError: '>=' not supported between instances of 'float' and 'str'
我需要的输出是这样的:
year city_code total_tax
id_inf
9 2002 a 0.0
9 2003 a 417.0
9 2004 a 950.0
9 2005 a 0.0
9 2006 a 0.0
54 2002 b 801.0
54 2003 b 0.0
54 2004 b 218.0
54 2005 b 886.0
54 2006 b 855.0
使用first
df['city_code']=df.groupby('id_inf').city_code.transform('first')
#df.groupby('id_inf').city_code.transform('first')
Out[278]:
id_inf
9 a
9 a
9 a
9 a
9 a
54 b
54 b
54 b
54 b
54 b
Name: city_code, dtype: object