我有以下四列的示例数据ID, Name, Age, Salary, Country
ID: 1, 1, 2, 3, 3, 3
Name: David, Peter, Alex, Kevin, Steve, Morty
Age: 30, 35, 40, 20, 25, 35
Salary: 3000, 2000, 5000, 1500, 2000, 3000
Country: USA, UK, Germany, France, UK, USA
我想按如下方式排列每一行和每一列的数据。
Row_1: col_1, col_2, col_3, col_4, col_5, col_6, col_7, col_8, col_9, col_10, col_11, col_12, col_13
Row-2: 1, David, 30, 3000, USA, Peter, 35, 2000, UK, Nan, Nan, Nan, Nan
Row_3: 2, Alex, 40, 5000, Germany, Nan, Nan, Nan, Nan, Nan, Nan, Nan, Nan
Row_4: 3, Kevin, 20, 1500, France, Steve, 25, 2000, UK, Morty, 35, 3000, USA
需要注意的是,有些列没有任何值,因此Nan
这是一个你可以尝试的可能的解决方案:
import pandas as pd
import numpy as np
df = pd.DataFrame({'ID': [1, 1, 2, 3, 3, 3],
'Name': ['David', 'Peter', 'Alex', 'Kevin', 'Steve', 'Morty'],
'Age': [30, 35, 40, 20, 25, 35],
'Salary': [3000, 2000, 5000, 1500, 2000, 3000],
'Country': ['USA', 'UK', 'Germany', 'France', 'UK', 'USA']})
rows = df['ID'].unique()
num = max(df['ID'].value_counts())
num_of_cols = num * 4 + 1
ROWS = 3
COLS = 13
df1 = pd.DataFrame(index=np.arange(ROWS), columns=np.arange(COLS))
idx = 0
for i in rows:
temp = df.loc[df['ID'] == i]
a = [i]
for index in range(0,len(temp)):
a = a + temp.iloc[index,1:5].tolist()
for j in range(len(a),num_of_cols):
a.append('NaN')
df1.loc[idx] = a
idx = idx + 1
print(df1)
输出:
0 1 2 3 4 5 6 7 8 9 10 11 12
0 1 David 30 3000 USA Peter 35 2000 UK NaN NaN NaN NaN
1 2 Alex 40 5000 Germany NaN NaN NaN NaN NaN NaN NaN NaN
2 3 Kevin 20 1500 France Steve 25 2000 UK Morty 35 3000 USA