我想要做的是为每个电子邮件地址是相同的,采取相应的行与相同的电子邮件,并创建新的数据框,然后发送一行信息的电子邮件到电子邮件地址在col 1。
| email | Acct # | Acct Status |
| ------------------|--------|-------------|
| janedoe@gmail.com | 1230 | Closed |
| janedoe@gmail.com | 2546 | Closed |
| janedoe@gmail.com | 2468 | Closed |
| janedoe@gmail.com | 7896 | Closed |
| michaeldoe@aol.com| 4565 | Closed |
| michaeldoe@aol.com| 9686 | Closed |
|jackdoe@aol.com | 4656 | Closed |
我尝试了一些通过使用groupby将数据框转换为列表的方法,但我卡住了:
df_list = [x for _, x in df.groupby(['email'])
我不确定你想如何存储你的数据或你想用它做什么。我选择将输出存储在Python字典中,以电子邮件联系人作为键,并将其所有各种帐户和状态作为值。您可以组合使用groupby和drop_duplicate来提取并形成您想要的信息。
df_grouped = df.groupby('email').groups
df_contacts = df.drop_duplicates(subset = ['email'])
result = {} # dictionary for results
for item in df_contacts['email']:
rows = df_grouped[item].tolist()
my_data = []
for x in rows:
info = df[['Accnt #', 'Accnt Status']].iloc[x].values
my_data.append(info.tolist())
result[item] = my_data
则可以根据需要使用数据。例如:
for i, j in result.items():
print('Send email to ', i, ' with their account info as follows')
for z in j:
print('Account : ', z[0], ' Status :', z[1])
如果出于某种原因,您确实希望结果数据放在单独的dataframe中,那么它可以在dataframe的Dictionary中,如下所示:
dx = {}
for i, j in result.items():
dfx = pd.DataFrame.from_dict(result[i])
dfx.columns =['Accnt', 'Accnt Status']
dx[i]=dfx
print(dx['janedoe@gmail.com']) #as an example of accessing the data