Python无法从重复的轴重新建立索引


  • 我正在使用groupby合并具有相同TransactionId的行
  • 代码
ldf_object_page_data.groupby('TransactionId')[columns].agg(
' '.join).reset_index()
  • 错误cannot reindex from a duplicate axis
  • 样本DF
Transaction_Date    Particulars Others  Others  Cheque Number   Debit   Credit  Balance IsTransactionStart  TransactionId
Date    Remarks Tran Id UTR Number  Instr. ID   Withdrawals Deposits    Balance False   11
01/04/2020  AA1746128   S71737774   -       57000       -4,84,31,253.20 False   11
03/04/2020  TO MADHAV LAAD  AA213003    -   33215031    7000        -4,84,38,253.20 False   11
03/04/2020  TO PANDRINATH GANGRADE  AA214967    -   33215032    13000       -4,84,51,253.20 False   11
03/04/2020  TO NITIN DHANGAR    AA216517    -   33215034    30000       -4,84,81,253.20 False   11
03/04/2020  RTGSO- ELECTRICITY EXP MPPKVVCL UBINH20094172099    S80318780   -   33215033    5,68,499.00     -4,90,49,752.20 True    12
03/04/2020  RTGSO-BHARAT COTTON GINNERS UBINH20094172392    S80321244   -   33215035    3,44,708.00     -4,93,94,460.20 True    13
06/04/2020  OIC153500   DO KHANDWA  S89963710   -   33211781    63407       -4,94,57,867.20 False   13
07/04/2020  RTGS:DHARA AGRO INDUSTRIES ICIC409700372928 S93671963   -           8,93,238.00 -4,85,64,629.20 False   13
08/04/2020  TRF TO JITENDRA SINGH UBEJA AA205798    -   33215036    7,00,000.00     -4,92,64,629.20 True    14
  • CSV中的DF

问题是列名重复,首先需要消除它们的重复,然后连接转换为字符串:

df.columns = pd.io.parsers.ParserBase({'names':df.columns})._maybe_dedup_names(df.columns)
df = (df.set_index('TransactionId')
.astype(str)
.groupby('TransactionId')
.agg(' '.join)
.reset_index())

如果需要删除重复项:

df = (df.set_index('TransactionId')
.astype(str)
.groupby('TransactionId')
.agg(lambda x: ' '.join(dict.fromkeys(x)))
.reset_index())

最新更新