如何将 Pandas 数据帧中的每一行分解为多行



我有一个函数正在传递一个熊猫数据帧,对于该数据帧中的每一行,我想创建 N 其他行,除了 2 列值之外,每行都等效于原始行。

执行此操作的正确方法 - 尤其是以 RAM 高效的方式?

到目前为止,我的尝试是运行pd.apply,然后为其中的每一行调用一个函数,该函数返回pd.Series对象的列表,然后我将调用append以将它们添加到原始DataFrame。 不过,这还没有成功。

这是我尝试使用一些虚拟代码进行复制的示例:

students = [ ('Jack', 34, 'Sydney' , 'Australia') ,
('Jill', 30, 'New York' , 'USA' ) ]
# Create a DataFrame object
df = pd.DataFrame(students, columns = ['Name' , 'Age', 'City' , 'Country'], index=['a', 'b', 'c' , 'd' , 'e' , 'f'])
# function I will use to explode a single row into 10 new rows
def replicate(x):
new_rows = []
i = 0
for j in range(3):
y = x.copy(deep=True)
y.Age = i
i += 1
new_rows.append(y)
return new_rows
# Iterate over each row and append the results
df.apply(lambda x: df.append(replicate(x))

对于上述内容,我希望输出如下:

Jack, 34, Sydney, Australia
Jack, 0, Sydney, Australia
Jack, 1, Sydney, Australia
Jack, 2, Sydney, Australia
Jill, 30, New York, USA
Jill, 0, New York, USA
Jill, 1, New York, USA
Jill, 2, New York, USA

最后,我希望我的数据帧具有 N 倍的行数,我可以在其中计算原始行的派生行。 我想以有效的空间方式做到这一点,而现在还没有发生。 任何帮助不胜感激!

IIUC,您需要np.repeat,使用Age列指定重复次数,然后在事后修复年龄列。

import pandas as pd
df1 = pd.DataFrame(df.values.repeat(df.Age+1, axis=0), 
columns=['Name', 'Age', 'City', 'Country'])
df1['Age'] = (df1.groupby([*df1]).cumcount()-1).where(df1.duplicated(), df1['Age'])

输出df1

Name Age      City    Country
0   Jack  34    Sydney  Australia
1   Jack   0    Sydney  Australia
2   Jack   1    Sydney  Australia
3   Jack   2    Sydney  Australia
4   Jack   3    Sydney  Australia
...
34  Jack  33    Sydney  Australia
35  Jill  30  New York        USA
...
63  Jill  27  New York        USA
64  Jill  28  New York        USA
65  Jill  29  New York        USA
[66 rows x 4 columns]

输入df

Name  Age      City    Country
a  Jack   34    Sydney  Australia
b  Jill   30  New York        USA

IIUC

d={x : y.set_index('Age').reindex(range(y['Age'].iloc[0]+1),method='bfill') for x , y in df.groupby(level=0)}
newdf=pd.concat(d).reset_index(level=1)
newdf
Out[220]: 
Age  Name      City    Country
a     0  Jack    Sydney  Australia
a     1  Jack    Sydney  Australia
a     2  Jack    Sydney  Australia
a     3  Jack    Sydney  Australia
a     4  Jack    Sydney  Australia
a     5  Jack    Sydney  Australia
a     6  Jack    Sydney  Australia
a     7  Jack    Sydney  Australia
a     8  Jack    Sydney  Australia
a     9  Jack    Sydney  Australia
a    10  Jack    Sydney  Australia
a    11  Jack    Sydney  Australia
a    12  Jack    Sydney  Australia
a    13  Jack    Sydney  Australia
a    14  Jack    Sydney  Australia
a    15  Jack    Sydney  Australia
a    16  Jack    Sydney  Australia
a    17  Jack    Sydney  Australia
a    18  Jack    Sydney  Australia
a    19  Jack    Sydney  Australia
a    20  Jack    Sydney  Australia
a    21  Jack    Sydney  Australia
a    22  Jack    Sydney  Australia
a    23  Jack    Sydney  Australia
a    24  Jack    Sydney  Australia
a    25  Jack    Sydney  Australia
a    26  Jack    Sydney  Australia
a    27  Jack    Sydney  Australia
a    28  Jack    Sydney  Australia
a    29  Jack    Sydney  Australia
..  ...   ...       ...        ...
b     1  Jill  New York        USA
b     2  Jill  New York        USA
b     3  Jill  New York        USA
b     4  Jill  New York        USA
b     5  Jill  New York        USA
b     6  Jill  New York        USA
b     7  Jill  New York        USA
b     8  Jill  New York        USA
b     9  Jill  New York        USA
b    10  Jill  New York        USA
b    11  Jill  New York        USA
b    12  Jill  New York        USA
b    13  Jill  New York        USA
b    14  Jill  New York        USA
b    15  Jill  New York        USA
b    16  Jill  New York        USA
b    17  Jill  New York        USA
b    18  Jill  New York        USA
b    19  Jill  New York        USA
b    20  Jill  New York        USA
b    21  Jill  New York        USA
b    22  Jill  New York        USA
b    23  Jill  New York        USA
b    24  Jill  New York        USA
b    25  Jill  New York        USA
b    26  Jill  New York        USA
b    27  Jill  New York        USA
b    28  Jill  New York        USA
b    29  Jill  New York        USA
b    30  Jill  New York        USA
[66 rows x 4 columns]

您可以将数据帧放在列表中,然后执行任何操作:

# x5 row duplicate
df = df.append([df]*5, ignore_index=True)
df.sort_values(by='Name').head(15)

# Result
Name    Age     City    Country
28  John    16  New York    US
4   John    16  New York    US
22  John    16  New York    US
34  John    16  New York    US
16  John    16  New York    US
10  John    16  New York    US
17  Mike    17  las vegas   US
29  Mike    17  las vegas   US
23  Mike    17  las vegas   US
11  Mike    17  las vegas   US
35  Mike    17  las vegas   US
5   Mike    17  las vegas   US
3   Neelu   32  Bangalore   India
33  Neelu   32  Bangalore   India
15  Neelu   32  Bangalore   India

最新更新