如何合并(连接)熊猫中每列值不同的两行



我正试图根据"国家"的名称将四行合并为两行。数据帧如下(很抱歉格式不好,如果有更好的显示方式,请告诉我(:

(Index),Country,SPI_Score,WHR_Score
...............................
190,Congo Republic of,48.45, NaN
191,Congo Democratic Republic of,42.25, NaN
................................
198,Congo (Brazzaville), NaN ,5.194
199,Congo (Kinshasa), NaN ,4.311

我在这里的问题是,当我加入外部时,这些国家有不同的名字。我试着这样替换国家名称:

for i in range(len(df['Country'])):
if df.iloc[i]['Country'] in ['Congo Republic of', 'Congo (Brazzaville)']:
df.iloc[i]['Country'] = 'Republic of the Congo'
elif df[i]['Country'] in ['Congo Democratic Republic of', 'Congo (Kinshasa)']:
df.iloc[i]['Country'] = 'Democratic Republic of the Congo'
else:
continue

然而,这并没有起作用,给了我最初的df。我想要的输出是:

(Index),Country,SPI_Score,WHR_Score
...............................
190,Republic of the Congo,48.45, 5.194
191,Democratic Republic of the Congo,42.25, 4.311

您可以将名称映射放入字典中,并将map放入新名称中。设置

name_mapper = {'Congo Republic of':'Republic of the Congo',
'Congo (Brazzaville)':'Republic of the Congo',
'Congo Democratic Republic of' : 'Democratic Republic of the Congo', 
'Congo (Kinshasa)': 'Democratic Republic of the Congo'
}

映射列的最简单方法是使用类似的东西

df['Country'].map(name_mapper)

但是,如果这个dict的密钥上的'Country'不匹配,那么它将返回NaNs。因此,下面是一个更健壮的版本

df['C']  = df['Country'].apply(lambda v:name_mapper.get(v,v))

现在我们可以在'C'上分组

df.groupby('C').sum()

以获得


C                                   SPI_Score   WHR_Score
0   Democratic Republic of the Congo    42.25   4.311
1   Republic of the Congo               48.45   5.194

最新更新