我在Databricks>Python/w Pandas。我想限制数据从以下位置移动:
ID | 账户 | 电话//tr>|
---|---|---|
1234 | 1 | 4437935470 |
1234 | 1 | 4437935470 |
1234 | 2 | 4437935472//tr>|
1234 | 2 | 4437935473 |
1235 | 3 | 4437935474 |
1235 | 4 | 4437935475 |
1236 | 4 | 4437935476 |
1236 | 4 | 4437935477 |
除了需要删除重复项外,这本质上是由两列组成的透视图。阅读本指南中的更多内容。尝试:
(df.drop_duplicates(['ID','Account','Phone'])
.assign(col=df.groupby(['ID','Account']).cumcount()+1)
.set_index(['ID','Account','col'])
['Phone'].unstack().add_prefix('Phone ')
.reset_index()
)
输出(请注意,与预期输出相比,样本数据中存在拼写错误(:
col ID Account Phone 1 Phone 2 Phone 3
0 1234 1 4437935470 NaN NaN
1 1234 2 4437935472 4437935473 NaN
2 1235 3 4437935474 NaN NaN
3 1236 4 4437935475 4437935476 4437935477