我已经为我的数据帧预定义了报头。但是我们接收到的数据可能包括也可能不包括所有的列。但是,我们的输出文件应该包含header中的所有字段。如果输入文件中没有数据,则保持空白,否则填充值。
input file 1:
ID Name Address Shippment
1 john address1 Y
2 Jessy address2 N
input file2
ID Name Address Shippment Delivered
1 john address1 Y Y
2 Jessy address2 N N
headers=['ID','Name','Address','Shippment','Delivered']
output file 1:
ID Name Address Shippment Delivered
1 john address1 Y
2 Jessy address2 N
output file 1:
ID Name Address Shippment Delivered
1 john address1 Y Y
2 Jessy address2 N N
我如何映射头到源文件时,头是随机的?
我尝试使用zip和update,但这看索引和地图。Source列可以是随机的,并且必须与header中的字段顺序匹配。
for i,index in zip(header,df):
final.update({i: df[index].tolist()})
df_final= pd.DataFrame(final)
我怎样才能做到这一点?
您可以使用reindex()
来确保包含所有预定义的标头。数据中任何缺失的列都将用NaN填充:
headers = ['ID','Name','Address','Shippment','Delivered']
df = df.reindex(columns=headers)
输入
ID Name Address Shippment
0 1 john address1 Y
1 2 jessy address2 N
输出ID Name Address Shippment Delivered
0 1 john address1 Y NaN
1 2 jessy address2 N NaN
如果输入中的列不在预定义的序列中,reindex()
也将修复此问题:
输入
ID Address Name Shippment
0 1 address1 john Y
1 2 address2 jessy N
输出ID Name Address Shippment Delivered
0 1 john address1 Y NaN
1 2 jessy address2 N NaN