我想合并到表中,同时用另一个表中的非null值替换一个表的null值。
下面的代码是要合并的表的示例:
# Table 1 (has rows with missing values)
a=['x','x','x','y','y','y']
b=['z', 'z', 'z' ,'w', 'w' ,'w' ]
c=[1,1,1,np.nan, np.nan, np.nan]
table_1=pd.DataFrame({'a':a, 'b':b, 'c':c})
table_1
a b c
0 x z 1.0
1 x z 1.0
2 x z 1.0
3 y w NaN
4 y w NaN
5 y w NaN
# Table 2 (new table to be appended to table_1, and would need to use values in column 'c' to replace values in the same column in table_1)
a=['y', 'y', 'y']
b=['w', 'w', 'w']
c=[2,2,2]
table_2=pd.DataFrame({'a':a, 'b':b, 'c':c})
table_2
a b c
0 y w 2
1 y w 2
2 y w 2
这是我用来合并这两个表的代码,我得到的输出是
# Merging the two tables
merged_table=pd.merge(table_1, table_2, on=['a', 'b'], how='left')
merged_table
当前输出(我不明白为什么行数增加(:
a b c_x c_y
0 x z 1.0 NaN
1 x z 1.0 NaN
2 x z 1.0 NaN
3 y w NaN 2.0
4 y w NaN 2.0
5 y w NaN 2.0
6 y w NaN 2.0
7 y w NaN 2.0
8 y w NaN 2.0
9 y w NaN 2.0
10 y w NaN 2.0
11 y w NaN 2.0
所需输出(用表2中的数值替换表_1中"c"列中的空值(:
a b c
0 x z 1.0
1 x z 1.0
2 x z 1.0
3 y w 2.0
4 y w 2.0
5 y w 2.0
try:
out=table_1.append(table_2).dropna(subset=['c']).reset_index(drop=True)
#OR
out=pd.concat([table_1,table_2]).dropna(subset=['c']).reset_index(drop=True)
out
:的输出
a b c
0 x z 1.0
1 x z 1.0
2 x z 1.0
3 y w 2.0
4 y w 2.0
5 y w 2.0