输出
假设我有一个数据集(df_data
(,如下所示:
Time Geography Population
2016 England and Wales 58381200
2017 England and Wales 58744600
2016 Northern Ireland 1862100
2017 Northern Ireland 1870800
2016 Scotland 5404700
2017 Scotland 5424800
2016 Wales 3113200
2017 Wales 3125200
如果我执行以下操作:
df_nireland = df_data[df_data['Geography']=='Northern Ireland']
df_wales = df_data[df_data['Geography']=='Wales']
df_scotland = df_data[df_data['Geography']=='Scotland']
df_engl_n_wales = df_data[df_data['Geography']=='England and Wales']
df_england = df_engl_n_wales
df_england['Population'] = df_engl_n_wales['Population'] - df_wales['Population']
则df_england
在列 Population
处具有 NA 值。
我该如何解决这个问题?
顺便说一句,我已经阅读了相关帖子,但确实对我有用(.loc
,.copy
等(。
这确实是一个组织问题。如果您pivot
那么您可以轻松进行减法,并确保对齐Time
df_pop = df.pivot(index='Time', columns='Geography', values='Population')
df_pop['England'] = df_pop['England and Wales'] - df_pop['Wales']
输出df_pop
:
Geography England and Wales Northern Ireland Scotland Wales England
Time
2016 58381200 1862100 5404700 3113200 55268000
2017 58744600 1870800 5424800 3125200 55619400
如果您需要恢复到原始格式,那么您可以执行以下操作:
df_pop.stack().to_frame('Population').reset_index()
# Time Geography Population
#0 2016 England and Wales 58381200
#1 2016 Northern Ireland 1862100
#2 2016 Scotland 5404700
#3 2016 Wales 3113200
#4 2016 England 55268000
#5 2017 England and Wales 58744600
#6 2017 Northern Ireland 1870800
#7 2017 Scotland 5424800
#8 2017 Wales 3125200
#9 2017 England 55619400
我只需要执行以下操作:
df_nireland = df_data[df_data['Geography']=='Northern Ireland'].reset_index(drop=True)
df_wales = df_data[df_data['Geography']=='Wales'].reset_index(drop=True)
df_scotland = df_data[df_data['Geography']=='Scotland'].reset_index(drop=True)
df_engl_n_wales = df_data[df_data['Geography']=='England and Wales'].reset_index(drop=True)
df_england = df_engl_n_wales
df_england['Population'] = df_engl_n_wales['Population'] - df_wales['Population']
或者原则上更好的方法,因为您保留了初始数据帧的索引,如下所示:
df_nireland = df_data[df_data['Geography']=='Northern Ireland']
df_wales = df_data[df_data['Geography']=='Wales']
df_scotland = df_data[df_data['Geography']=='Scotland']
df_engl_n_wales = df_data[df_data['Geography']=='England and Wales']
df_england = df_engl_n_wales
df_england['Population'] = df_engl_n_wales['Population'] - df_wales['Population'].values