如何将水平数据帧结构转换为垂直python



我有一个脚本,可以读取.xlsx文件并创建一个看起来像的数据帧

index|TASK|CODE|NAME|WT|ST|ORIGIN|SRV|DESTINY|FT|MCLINE|ST.1|ORIGIN.1|SRV.1|DESTINY.1|FT.1|MCLINE.1

这可能会更长,具体取决于excel文件的列,并且只重复字段ST.(n)ORIGIN.(n)SRV.(n)DESTINY.(n)FT.(n)MCLINE.(n))

例如

>>>>td style="text align:right;">12900>Pat Colat Col6:06>>td style="ext-align:left;">LAMPARD,FRANK>Pat Colat列/td>"text align:right;">>td style="ext-align:left;">GERRARD,史蒂文>td style="ext-align:left;">Grand StationSE005O>Grand StationGrand Station"text align:left;">Pat Oda19:55//tr>
indexTASKCODENAMEWTSTORIGINSRVDESTINY="text align=right;">MCLINE.1ST.2ORIGIN.2SRV.2DESTINY.2FT.2
061P00QHCROUCH,彼得·06:1414:4661P004TEtap 1-R*G0431
161P00CH1090007:1306:201:33Pat列61P00CT14:13Etapa 1-R*D0431
25SE00DH1804907:3011:5516:41Grand StationD0290/CopaD0291

这是pandas.wide_to_long的经典情况。但我们需要先做一点调整,因为这个函数需要所有类似的列都以相同的模式命名,<COLNAME.N>

import pandas as pd
# import your data into a dataframe df
common_cols = ['ST', 'ORIGIN', 'SRV', 'DESTINY', 'FT', 'MCLINE']
df = df.rename({col: col + '.0' for col in common_cols}, axis=1)
_df = (pd.wide_to_long(df, stubnames=common_cols,
i=['TASK', 'CODE', 'NAME', 'WT'],
j='n',
suffix=r'.d*')
.reset_index()
.drop('n', axis=1)
.dropna())

结果如下:

TASK   CODE            NAME     WT     ST        ORIGIN      SRV       DESTINY     FT                         MCLINE
0  61P00QH  12900    CROUCH,PETER  06:14  14:46        PatCol  61P004T        PatCol  16:06                  Etap1-R*G0431
3  61P00CH  10900   LAMPARD,FRANK  07:13  06:20        PatCol  61P00CT        PatCol  09:53                  Etap1-R*D0431
4  61P00CH  10900   LAMPARD,FRANK  07:13  10:33        PatCol  61P00CT        PatCol  14:13                 Etapa1-R*D0431
6  5SE00DH  18049  GERRARD,STEVEN  07:30  11:55  GrandStation  5SE005O  GrandStation  16:01  Grandstation*D0290/Copa*D0291
7  5SE00DH  18049  GERRARD,STEVEN  07:30  16:41  GrandStation  5SE003O  GrandStation  17:37                             No
8  5SE00DH  18049  GERRARD,STEVEN  07:30  17:41  GrandStation  5SE009O        PatOda  19:55  GrandStation*D0290/Copa*D0291

最新更新