我有一个包含我所有训练、验证和测试数据的数据帧。以及仅包含我的测试数据的数据帧。数据点由"data_index"指定。
df_all = pd.DataFrame({'data_index': range(7), 'split': 'NA'})
df_all.set_index('data_index', inplace=True)
df_test = pd.DataFrame({'data_index': [3, 5], 'split': 'test'})
df_test.set_index('data_index', inplace=True)
split
data_index
0 NA
1 NA
2 NA
3 NA
4 NA
5 NA
6 NA
split
data_index
3 test
5 test
如何根据测试数据帧在第一个数据帧中填写"拆分"列的值?为了得到这样的东西:
split
data_index
0 train/val
1 train/val
2 train/val
3 test
4 train/val
5 test
6 train/val
将Index.map
与fillna
一起使用:
df_all['split'] = df_all.index.map(df_test['split'].get)
df_all['split']= df_all['split'].fillna('train/val')
print (df_all)
split
data_index
0 train/val
1 train/val
2 train/val
3 test
4 train/val
5 test
6 train/val
如果有缺失值,请使用combine_first
:
#defined np.nan for missing values, not string NA
df_all = pd.DataFrame({'data_index': range(7), 'split': np.nan})
df_all.set_index('data_index', inplace=True)
df_test = pd.DataFrame({'data_index': [3, 5], 'split': 'test'})
df_test.set_index('data_index', inplace=True)
df_all['split'] = df_all['split'].combine_first(df_test['split']).fillna('train/val')
print (df_all)
split
data_index
0 train/val
1 train/val
2 train/val
3 test
4 train/val
5 test
6 train/val
除了上面解释的 Index.map 之外,这个问题也可以使用以下方法使用一些基本概念来解决:
df = pd.merge(df_all, df_test, how='left', on='data_index')
df.drop(['split_x'], axis=1, inplace=True)
df = df.rename(columns={'split_y': 'split'})
df.loc[df.split != 'test', 'split'] = 'train/val'
每行后面的结果是:
split_x split_y
data_index
0 NA NaN
1 NA NaN
2 NA NaN
3 NA test
4 NA NaN
5 NA test
6 NA NaN
split_y
data_index
0 NaN
1 NaN
2 NaN
3 test
4 NaN
5 test
6 NaN
split
data_index
0 NaN
1 NaN
2 NaN
3 test
4 NaN
5 test
6 NaN
split
data_index
0 train/val
1 train/val
2 train/val
3 test
4 train/val
5 test
6 train/val