将panda中的列标题与上一个标题名称串联起来重命名



我有一组列名,其中名称Description多次存在。

['Sl.No', 'Job', 'Description', 'Vendor', 'Description', 'WO No',
'Accounting Center ', 'Description', 'Nature Of Work', 'WO Type',
'WO Date', 'WO From Date', 'WO To Date', 'Bill No', 'Running Bill No',
'Bill Date', 'Bill Status', 'Voucher No', 'Voucher Date',
'Bank nVoucher No', 'Bank Vouchern Date', ' Paid Amount', 'Currency',
'WO Amt', 'Bill Amt', 'Service Tax Amt', 'VAT Amt', 'Total Tax nAmt',
' Advance Amt', 'Gross Amt', 'Deduction Amt', 'Net Amt']

我想要一个名字";描述";与后面的任何标头名称连接

例如

  • "作业","描述"将变为"作业描述">

  • "Vendor"、"Description"将变为"Vendor_Description",依此类推

我需要一个不引入任何硬编码的逻辑

['Sl.No', 'Job', 'Job_Description', 'Vendor', 'Vendor_Description',....]

您需要使用pd.Series.eq确定Description的位置,然后使用pd.Index.to_series,这样我们就可以使用pd.Series.shift

cols = df.columns.to_series()
m = cols.eq('Description') #would mark True where value is `Description`
out = cols.shift().str.strip().str.replace('s+','_') + '_' + cols
out[~m] = cols
out.to_list()
['Sl.No', 'Job', 'Job_Description', 'Vendor', 'Vendor_Description', 'WO No',
'Accounting Center ', 'Accounting_Center_Description', 'Nature Of Work',
'WO Type', 'WO Date', 'WO From Date', 'WO To Date', 'Bill No', 'Running Bill No',
'Bill Date', 'Bill Status', 'Voucher No', 'Voucher Date', 'Bank nVoucher No',
'Bank Vouchern Date', ' Paid Amount', 'Currency', 'WO Amt', 'Bill Amt',
'Service Tax Amt', 'VAT Amt', 'Total Tax nAmt', ' Advance Amt', 'Gross Amt',
'Deduction Amt', 'Net Amt']

如果它是一个数据帧,那么在不处理底层列数据本身的情况下,无法将连接的列名分配回数据帧。否则,您可能会得到"ValueError:长度不匹配">错误。

如果它只是一个值列表,并且您希望将值"连接起来;描述";到它的前一个元素,那么下面的代码可能会有所帮助。


colList = ['Sl.No', 'Job', 'Description', 'Vendor', 'Description', 'WO No',
'Accounting Center ', 'Description', 'Nature Of Work', 'WO Type',
'WO Date', 'WO From Date', 'WO To Date', 'Bill No', 'Running Bill No',
'Bill Date', 'Bill Status', 'Voucher No', 'Voucher Date',
'Bank nVoucher No', 'Bank Vouchern Date', ' Paid Amount', 'Currency',
'WO Amt', 'Bill Amt', 'Service Tax Amt', 'VAT Amt', 'Total Tax nAmt',
' Advance Amt', 'Gross Amt', 'Deduction Amt', 'Net Amt']
colList = [i.replace('n','').strip() for i in colList] #removing newline and trailing spaces
desc_indices = [i for i, x in enumerate(colList) if x == "Description"]
pre_desc_indices = [i-1 for i in desc_indices]
newColList = []
for i, x in enumerate(colList):
if i in pre_desc_indices:
newColList.append(x.replace(' ','_')+'_'+colList[i+1])
elif i in desc_indices:
continue
else:
newColList.append(x.replace(' ','_'))
print(colList)
print(newColList)

cols作为原始列表的名称。那么代码可能是:

D = "Description"
previous_current = zip([None] + cols[:-1], cols)
df.columns = [curr if curr != D else prev + "_" + D  for prev, curr in previous_current]

结果:

>>> df.columns
Index(['Sl.No', 'Job', 'Job_Description', 'Vendor', 'Vendor_Description',
'WO No', 'Accounting Center ', 'Accounting Center _Description',
'Nature Of Work', 'WO Type', 'WO Date', 'WO From Date', 'WO To Date',
'Bill No', 'Running Bill No', 'Bill Date', 'Bill Status', 'Voucher No',
'Voucher Date', 'Bank nVoucher No', 'Bank Vouchern Date',
' Paid Amount', 'Currency', 'WO Amt', 'Bill Amt', 'Service Tax Amt',
'VAT Amt', 'Total Tax nAmt', ' Advance Amt', 'Gross Amt',
'Deduction Amt', 'Net Amt'],
dtype='object')

解释:

我们构造了一些东西作为对列表(previous, current):

>>> previous_current = zip([None] + cols[:-1], cols)
>>> list(previous_current)
[(None, 'Sl.No'),
('Sl.No', 'Job'),
('Job', 'Description'),
('Description', 'Vendor'),
('Vendor', 'Description'),
('Description', 'WO No'),
('WO No', 'Accounting Center '),
('Accounting Center ', 'Description'),
('Description', 'Nature Of Work'),
('Nature Of Work', 'WO Type'),
('WO Type', 'WO Date'),
('WO Date', 'WO From Date'),
('WO From Date', 'WO To Date'),
('WO To Date', 'Bill No'),
('Bill No', 'Running Bill No'),
('Running Bill No', 'Bill Date'),
('Bill Date', 'Bill Status'),
('Bill Status', 'Voucher No'),
('Voucher No', 'Voucher Date'),
('Voucher Date', 'Bank nVoucher No'),
('Bank nVoucher No', 'Bank Vouchern Date'),
('Bank Vouchern Date', ' Paid Amount'),
(' Paid Amount', 'Currency'),
('Currency', 'WO Amt'),
('WO Amt', 'Bill Amt'),
('Bill Amt', 'Service Tax Amt'),
('Service Tax Amt', 'VAT Amt'),
('VAT Amt', 'Total Tax nAmt'),
('Total Tax nAmt', ' Advance Amt'),
(' Advance Amt', 'Gross Amt'),
('Gross Amt', 'Deduction Amt'),
('Deduction Amt', 'Net Amt')]

然后我们迭代它,并通过对的2nd元素(当前列标签(,我们决定

  • 是否保持原样,或者
  • 将其附加到前一个列的标签(该对的1st元素(

最新更新