我在工作中使用了大量CSV数据。我正试图使用Pandas将成员"电子邮件"转换为其配偶"PrimaryMemberEmail"列的行。以下是我的意思:
import pandas as pd
user_data = {'FirstName':['John','Jane','Bob'],
'Lastname':['Snack','Snack','Tack'],
'EmployeeID':['12345','12345S','54321'],
'Email':['John@issues.com','NaN','Bob@issues.com'],
'DOB':['09/07/1988','12/25/1990','07/13/1964'],
'Role':['Employee On Plan','Spouse On Plan','Employee Off Plan'],
'PrimaryMemberEmail':['NaN','NaN','NaN'],
'PrimaryMemberEmployeeId':['NaN','12345','NaN']
}
df = pd.DataFrame(user_data)
我有成千上万排这样的。只有当用户是配偶时,我才需要用其相关主要持有者电子邮件的"电子邮件"填充"PrimaryMemberEmail"。因此,在这种情况下,我想将Jane Snack的"Primary MemberEmail"自动填充为她的配偶John Snack的电子邮件,即John@issues.com"我找不到一个好办法。目前我正在使用:
for i in (df['EmployeeId']):
p = (p + len(df['EmployeeId']) - (len(df['EmployeeId'])-1))
EEID = df['EmployeeId'].iloc[p]
if 'S' in EEID:
df['PrimaryMemberEmail'].iloc[p] = df['Email'].iloc[p-1]
让我困扰的是,只有当我的文件正确输入时,这才有效,就像我在示例DataFrame中所展示的那样。此外,我的NaN值不适用于dropna((或其他方法,但这是另一个问题。
我是python和编程的新手。我正在努力为自己目前的健康事业增加价值,我觉得这一切都很有趣。感谢您的帮助。
IIUC,map
值和fillna
:
df['PrimaryMemberEmail'] = (df['PrimaryMemberEmployeeId']
.map(df.set_index('EmployeeID')['PrimaryMemberEmail'])
.fillna(df['PrimaryMemberEmail'])
)
或者,如果您有真实的NaN(而不是字符串(,请使用布尔索引:
df.loc[df['PrimaryMemberEmployeeId'].notna(),
'PrimaryMemberEmail'] = df['PrimaryMemberEmployeeId'].map(df.set_index('EmployeeID')['PrimaryMemberEmail'])
输出:
FirstName Lastname EmployeeID DOB Role PrimaryMemberEmail PrimaryMemberEmployeeId
0 John Mack 12345 09/07/1988 Employee On Plan John@issues.com NaN
1 Jane Snack 12345S 12/25/1990 Spouse On Plan John@issues.com 12345
2 Bob Tack 54321 07/13/1964 Employee Off Plan Bob@issues.com NaN