从单行数据帧创建多行数据帧

我的数据帧是包含 N 列的单行：

col1    col2    col3    col4   col5    col6   col7     col8    col9      
0  NBA      Mens    Sports  LAL    Lakers   BOS   Celtics    SAS    Spurs

我正在寻找的输出是一个新的数据帧，如下所示，这里的前三列值对于所有行都是相同的。但是，对于新数据帧中的每一行，此新数据帧中的 col4 和 col5 值将替换为上述 DF 列值 col4，col5 和 col6，col 7 和 col8，col9 等

col1    col2    col3    col4    col5
0   NBA    Mens   Sports    LAL     Lakers
1   NBA    Mens   Sports    BOS     Celtics
2   NBA    Mens   Sports    SAS     Spurs

我的代码：

将单行转换为 ndarray

import pandas as pd
df = pd.read_csv('df_info.txt', sep=",", header=0)
vallist=df.as_matrix()[0]

创建字典以存储值

dict={}
n=4
varlist1=[]
for i in range(len(vallist)):
if(n<=9):
dict[i]={}
print(vallist[n],vallist[n+1])
dict[i]['col1']=vallist[0]
dict[i]['col2']=vallist[1]
dict[i]['col3']=vallist[2]
dict[i]['col4']=vallist[n]
dict[i]['col5']=vallist[n+1]
n+=2

将字典导入数据帧

df2=pd.DataFrame.from_dict(dict)
df2.transpose()

我得到了所需的结果，但我不相信，正在寻找更多的 pythonic 和熊猫方法来实现这一目标。

我们可以使用理解和巧妙的解包。

对于每一行，我抓取前三个值，其余的
```
a, b, c, *x in df.values
```
然后，我通过拉x[::2]和x[1::2]来循环浏览其余的每一对
使用rename和add_prefix确定列名。
这推广到前三列之后的任意数量的对。

pd.DataFrame([
[a, b, c, d, e]
for a, b, c, *x in df.values
for d, e in zip(x[::2], x[1::2])
]).rename(columns=lambda x: x + 1).add_prefix('col')
col1  col2    col3 col4     col5
0  NBA  Mens  Sports  LAL   Lakers
1  NBA  Mens  Sports  BOS  Celtics
2  NBA  Mens  Sports  SAS    Spurs

使用numpy.repeat和itertools.chain以及一些字典推导

：

import numpy as np
from itertools import chain
df['abbr_combined'] = list(zip(df.col4, df.col6, df.col8))
df['team_combined'] = list(zip(df.col5, df.col7, df.col9))
lens = df['team_combined'].map(len)
res = pd.DataFrame({**{col: np.repeat(df[col], lens) for col in ('col1', 'col2', 'col3')},
**{col: list(chain.from_iterable(df[name])) for col, name in
zip(('col4', 'col5'), ('abbr_combined', 'team_combined'))}})
print(res)
col1  col2    col3 col4     col5
0  NBA  Mens  Sports  LAL   Lakers
0  NBA  Mens  Sports  BOS  Celtics
0  NBA  Mens  Sports  SAS    Spurs

相关内容

最新更新

热门标签：