如何使用熊猫在一行中转置多列的单元格?



我有以下四列的示例数据ID, Name, Age, Salary, Country

ID: 1, 1, 2, 3, 3, 3
Name: David, Peter, Alex, Kevin, Steve, Morty
Age: 30, 35, 40, 20, 25, 35
Salary: 3000, 2000, 5000, 1500, 2000, 3000
Country: USA, UK, Germany, France, UK, USA

我想按如下方式排列每一行和每一列的数据。

Row_1: col_1, col_2, col_3, col_4, col_5, col_6, col_7, col_8, col_9, col_10, col_11, col_12, col_13
Row-2: 1, David, 30, 3000, USA, Peter, 35, 2000, UK, Nan, Nan, Nan, Nan
Row_3: 2, Alex, 40, 5000, Germany, Nan, Nan, Nan, Nan, Nan, Nan, Nan, Nan
Row_4: 3, Kevin, 20, 1500, France, Steve, 25, 2000, UK, Morty, 35, 3000, USA

需要注意的是,有些列没有任何值,因此Nan

这是一个你可以尝试的可能的解决方案:

import pandas as pd
import numpy as np
df = pd.DataFrame({'ID': [1, 1, 2, 3, 3, 3],
'Name': ['David', 'Peter', 'Alex', 'Kevin', 'Steve', 'Morty'],
'Age': [30, 35, 40, 20, 25, 35],
'Salary': [3000, 2000, 5000, 1500, 2000, 3000],
'Country': ['USA', 'UK', 'Germany', 'France', 'UK', 'USA']})
rows = df['ID'].unique()
num = max(df['ID'].value_counts())
num_of_cols = num * 4 + 1
ROWS = 3
COLS = 13
df1 = pd.DataFrame(index=np.arange(ROWS), columns=np.arange(COLS))
idx = 0
for i in rows:
temp = df.loc[df['ID'] == i]
a = [i]
for index in range(0,len(temp)):
a = a + temp.iloc[index,1:5].tolist()
for j in range(len(a),num_of_cols):
a.append('NaN')
df1.loc[idx] = a
idx = idx + 1
print(df1)

输出:

0      1   2     3        4      5    6     7    8      9    10    11   12
0  1  David  30  3000      USA  Peter   35  2000   UK    NaN  NaN   NaN  NaN
1  2   Alex  40  5000  Germany    NaN  NaN   NaN  NaN    NaN  NaN   NaN  NaN
2  3  Kevin  20  1500   France  Steve   25  2000   UK  Morty   35  3000  USA

最新更新