如果索引不同,则将特定行设置为行中找到的值



我在工作中使用了大量CSV数据。我正试图使用Pandas将成员"电子邮件"转换为其配偶"PrimaryMemberEmail"列的行。以下是我的意思:

import pandas as pd
user_data = {'FirstName':['John','Jane','Bob'],
'Lastname':['Snack','Snack','Tack'],
'EmployeeID':['12345','12345S','54321'],
'Email':['John@issues.com','NaN','Bob@issues.com'],
'DOB':['09/07/1988','12/25/1990','07/13/1964'],
'Role':['Employee On Plan','Spouse On Plan','Employee Off Plan'],
'PrimaryMemberEmail':['NaN','NaN','NaN'],
'PrimaryMemberEmployeeId':['NaN','12345','NaN']
}
df = pd.DataFrame(user_data)

我有成千上万排这样的。只有当用户是配偶时,我才需要用其相关主要持有者电子邮件的"电子邮件"填充"PrimaryMemberEmail"。因此,在这种情况下,我想将Jane Snack的"Primary MemberEmail"自动填充为她的配偶John Snack的电子邮件,即John@issues.com"我找不到一个好办法。目前我正在使用:

for i in (df['EmployeeId']):
p = (p + len(df['EmployeeId']) - (len(df['EmployeeId'])-1))
EEID = df['EmployeeId'].iloc[p]
if 'S' in EEID:
df['PrimaryMemberEmail'].iloc[p] = df['Email'].iloc[p-1]

让我困扰的是,只有当我的文件正确输入时,这才有效,就像我在示例DataFrame中所展示的那样。此外,我的NaN值不适用于dropna((或其他方法,但这是另一个问题。

我是python和编程的新手。我正在努力为自己目前的健康事业增加价值,我觉得这一切都很有趣。感谢您的帮助。

IIUC,map值和fillna:

df['PrimaryMemberEmail'] = (df['PrimaryMemberEmployeeId']
.map(df.set_index('EmployeeID')['PrimaryMemberEmail'])
.fillna(df['PrimaryMemberEmail'])
)

或者,如果您有真实的NaN(而不是字符串(,请使用布尔索引:

df.loc[df['PrimaryMemberEmployeeId'].notna(),
'PrimaryMemberEmail'] = df['PrimaryMemberEmployeeId'].map(df.set_index('EmployeeID')['PrimaryMemberEmail'])

输出:

FirstName Lastname EmployeeID         DOB               Role PrimaryMemberEmail PrimaryMemberEmployeeId
0      John     Mack      12345  09/07/1988   Employee On Plan    John@issues.com                     NaN
1      Jane    Snack     12345S  12/25/1990     Spouse On Plan    John@issues.com                   12345
2       Bob     Tack      54321  07/13/1964  Employee Off Plan     Bob@issues.com                     NaN

最新更新