操作熊猫中另一个数据帧中存在的数据帧行



我有一个包含我所有训练、验证和测试数据的数据帧。以及仅包含我的测试数据的数据帧。数据点由"data_index"指定。

df_all = pd.DataFrame({'data_index': range(7), 'split': 'NA'})
df_all.set_index('data_index', inplace=True)
df_test = pd.DataFrame({'data_index': [3, 5], 'split': 'test'})
df_test.set_index('data_index', inplace=True)

           split
data_index      
0             NA
1             NA
2             NA
3             NA
4             NA
5             NA
6             NA
           split
data_index      
3           test
5           test

如何根据测试数据帧在第一个数据帧中填写"拆分"列的值?为了得到这样的东西:

                split
data_index           
0           train/val
1           train/val
2           train/val
3                test
4           train/val
5                test
6           train/val

Index.mapfillna一起使用:

df_all['split'] = df_all.index.map(df_test['split'].get)
df_all['split']= df_all['split'].fillna('train/val') 
print (df_all)
                split
data_index           
0           train/val
1           train/val
2           train/val
3                test
4           train/val
5                test
6           train/val

如果有缺失值,请使用combine_first

#defined np.nan for missing values, not string NA
df_all = pd.DataFrame({'data_index': range(7), 'split': np.nan})
df_all.set_index('data_index', inplace=True)
df_test = pd.DataFrame({'data_index': [3, 5], 'split': 'test'})
df_test.set_index('data_index', inplace=True)
df_all['split'] = df_all['split'].combine_first(df_test['split']).fillna('train/val') 
print (df_all)
                split
data_index           
0           train/val
1           train/val
2           train/val
3                test
4           train/val
5                test
6           train/val

除了上面解释的 Index.map 之外,这个问题也可以使用以下方法使用一些基本概念来解决:

df = pd.merge(df_all, df_test, how='left', on='data_index')
df.drop(['split_x'], axis=1, inplace=True)
df = df.rename(columns={'split_y': 'split'})
df.loc[df.split != 'test', 'split'] = 'train/val'

每行后面的结果是:

          split_x split_y
data_index                
0               NA     NaN
1               NA     NaN
2               NA     NaN
3               NA    test
4               NA     NaN
5               NA    test
6               NA     NaN
           split_y
data_index        
0              NaN
1              NaN
2              NaN
3             test
4              NaN
5             test
6              NaN
           split
data_index      
0            NaN
1            NaN
2            NaN
3           test
4            NaN
5           test
6            NaN
                split
data_index           
0           train/val
1           train/val
2           train/val
3                test
4           train/val
5                test
6           train/val

最新更新