我有一个4列的csv文件:
Name Dept Email Name Hair Color
John Smith candy Lincoln Tun brown
Diana Princ candy John Smith gold
Perry Plat wood Oliver Twist bald
Jerry Springer clothes Diana Princ gold
Calvin Klein clothes
Lincoln Tun warehouse
Oliver Twist kitchen
我想按名称匹配Name
和Email Name
列。
最后的输出应该是这样的:
Name Dept Email Name Hair Color
John Smith candy John Smith gold
Diana Princ candy Diana Princ gold
Perry Plat wood
Jerry Springer clothes
Calvin Klein clothes
Lincoln Tun warehouse Lincoln Tun brown
Oliver Twist kitchen Oliver Twist bald
我在我的代码中尝试了这样做:
dfs = np.split(df,len(df.columns), axis=1)
dfs = [df.set_index(df.columns[0], drop=False) for df in dfs]
f=dfs[0].join(dfs[1:]).reset_index(drop=True).fillna(0)
排序我的两列很好,但使其他所有的0
Name Dept Email Name Hair Color
John Smith 0 John Smith 0
Diana Princ 0 Diana Princ 0
Perry Plat 0 0 0
Jerry Springer 0 0 0
Calvin Klein 0 0 0
Lincoln Tun 0 Lincoln Tun 0
Oliver Twist 0 Oliver Twist 0
下面是我的代码:
import pandas as pd
import numpy as np
import os, csv, sys
csvPath = 'User.csv'
df= pd.read_csv(csvPath)
dfs = np.split(df,len(df.columns), axis=1)
dfs = [df.set_index(df.columns[0], drop=False) for df in dfs]
f=dfs[0].join(dfs[1:]).reset_index(drop=True).fillna(0)
testCSV = 'test_user.csv' #to check my csv file
f.to_csv(testCSV, encoding='utf-8') #send it to csv
您可以使用合并:
pd.merge(df[['Name','Dept']],df[['Email Name','Hair Color']], left_on='Name', right_on='Email Name', how='left')
结果
Name Dept Email Name Hair Color
0 John Smith candy John Smith gold
1 Diana Princ candy Diana Princ gold
2 Perry Plat wood NaN NaN
3 Jerry Springer clothes NaN NaN
4 Calvin Klein clothes NaN NaN
5 Lincoln Tun warehouse Lincoln Tun brown
6 Oliver Twist kitchen Oliver Twist bald