数据帧的减法和赋值返回 NA



假设我有一个数据集(df_data(,如下所示:

Time    Geography                Population
2016    England and Wales        58381200
2017    England and Wales        58744600
2016    Northern Ireland         1862100
2017    Northern Ireland         1870800
2016    Scotland                 5404700
2017    Scotland                 5424800
2016    Wales                    3113200
2017    Wales                    3125200

如果我执行以下操作:

df_nireland = df_data[df_data['Geography']=='Northern Ireland']
df_wales = df_data[df_data['Geography']=='Wales']
df_scotland = df_data[df_data['Geography']=='Scotland']
df_engl_n_wales = df_data[df_data['Geography']=='England and Wales']
df_england = df_engl_n_wales
df_england['Population'] = df_engl_n_wales['Population'] - df_wales['Population']

df_england在列 Population 处具有 NA 值。

我该如何解决这个问题?

顺便说一句,我已经阅读了相关帖子,但确实对我有用(.loc.copy等(。

这确实是一个组织问题。如果您pivot那么您可以轻松进行减法,并确保对齐Time

df_pop = df.pivot(index='Time', columns='Geography', values='Population')
df_pop['England'] = df_pop['England and Wales'] - df_pop['Wales']

输出df_pop

Geography  England and Wales  Northern Ireland  Scotland    Wales   England
Time                                                                       
2016                58381200           1862100   5404700  3113200  55268000
2017                58744600           1870800   5424800  3125200  55619400

如果您需要恢复到原始格式,那么您可以执行以下操作:

df_pop.stack().to_frame('Population').reset_index()
#   Time          Geography  Population
#0  2016  England and Wales    58381200
#1  2016   Northern Ireland     1862100
#2  2016           Scotland     5404700
#3  2016              Wales     3113200
#4  2016            England    55268000
#5  2017  England and Wales    58744600
#6  2017   Northern Ireland     1870800
#7  2017           Scotland     5424800
#8  2017              Wales     3125200
#9  2017            England    55619400

我只需要执行以下操作:

df_nireland = df_data[df_data['Geography']=='Northern Ireland'].reset_index(drop=True)
df_wales = df_data[df_data['Geography']=='Wales'].reset_index(drop=True)
df_scotland = df_data[df_data['Geography']=='Scotland'].reset_index(drop=True)
df_engl_n_wales = df_data[df_data['Geography']=='England and Wales'].reset_index(drop=True)
df_england = df_engl_n_wales
df_england['Population'] = df_engl_n_wales['Population'] - df_wales['Population']

或者原则上更好的方法,因为您保留了初始数据帧的索引,如下所示:

df_nireland = df_data[df_data['Geography']=='Northern Ireland']
df_wales = df_data[df_data['Geography']=='Wales']
df_scotland = df_data[df_data['Geography']=='Scotland']
df_engl_n_wales = df_data[df_data['Geography']=='England and Wales']
df_england = df_engl_n_wales
df_england['Population'] = df_engl_n_wales['Population'] - df_wales['Population'].values

最新更新