给定pandas数据帧,数据帧中的每一行都应该是重复的,但重复的行与原始行的score1
和score2
列不同,因此它们的值被交换。此外,必须保持唯一ID的增量特性。
原始数据帧:
nme = ["Tom", "John", "Philip", "Ross"]
deg = ["MBA", "BCA", "M.Tech", "MBA"]
scr1 = [90, 40, 80, 98]
scr2 = [10, 20, 30, 50]
# dictionary of lists
dct = {'name': nme, 'degree': deg, 'score1': scr1, 'score2': scr2}
my_dataframe = pd.DataFrame(dct)
print(my_dataframe)
name degree score1 score2
0 Tom MBA 90 10
1 John BCA 40 20
2 Philip M.Tech 80 30
3 Ross MBA 98 50
预期数据帧:
nme = ["Tom","Tom", "John", "John", "Philip","Philip", "Ross", "Ross"]
deg = ["MBA", "MBA", "BCA", "BCA", "M.Tech", "M.Tech", "MBA", "MBA"]
scr1 = [90, 10, 40, 20, 80, 30, 98, 50]
scr2 = [10, 90, 20, 40, 30, 80, 50, 98]
# dictionary of lists
dct = {'name': nme, 'degree': deg, 'score1': scr1, 'score2': scr2}
my_new_dataframe = pd.DataFrame(dct)
print(my_new_dataframe)
name degree score1 score2
0 Tom MBA 90 10
1 Tom MBA 10 90
2 John BCA 40 20
3 John BCA 20 40
4 Philip M.Tech 80 30
5 Philip M.Tech 30 80
6 Ross MBA 98 50
7 Ross MBA 50 98
事先非常感谢。
这个对我有用:
new_df = pd.DataFrame(columns=['name', 'degree', 'score1', 'score2'])
j=0
for index, row in df.iterrows():
new_df.loc[j] = [row[0], row[1], row[2], row[3]]
new_df.loc[j+1] = [row[0], row[1], row[3], row[2]]
j+=2
new_df
输出:
name degree score1 score2
0 Tom MBA 90 10
1 Tom MBA 10 90
2 John BCA 40 20
3 John BCA 20 40
4 Philip M.Tech 80 30
5 Philip M.Tech 30 80
6 Ross MBA 98 50
7 Ross MBA 50 98
您可以使用itertools.permutations
和df.explode
来获得所需的输出。
from itertools import permutations
my_dataframe['combine'] = [list(permutations((i,j),2)) for i,j in zip(my_dataframe.score1,my_dataframe.score2)]
name degree score1 score2 combine
0 Tom MBA 90 10 [(90, 10), (10, 90)]
1 John BCA 40 20 [(40, 20), (20, 40)]
2 Philip M.Tech 80 30 [(80, 30), (30, 80)]
3 Ross MBA 98 50 [(98, 50), (50, 98)]
my_new_dataframe = my_dataframe.explode('combine')
name degree score1 score2 combine
0 Tom MBA 90 10 (90, 10)
0 Tom MBA 90 10 (10, 90)
1 John BCA 40 20 (40, 20)
1 John BCA 40 20 (20, 40)
2 Philip M.Tech 80 30 (80, 30)
2 Philip M.Tech 80 30 (30, 80)
3 Ross MBA 98 50 (98, 50)
3 Ross MBA 98 50 (50, 98)
my_new_dataframe[['score1','score2']] = pd.DataFrame(my_new_dataframe['combine'].tolist(), index= my_new_dataframe.index)
name degree score1 score2 combine
0 Tom MBA 90 10 (90, 10)
0 Tom MBA 10 90 (10, 90)
1 John BCA 40 20 (40, 20)
1 John BCA 20 40 (20, 40)
2 Philip M.Tech 80 30 (80, 30)
2 Philip M.Tech 30 80 (30, 80)
3 Ross MBA 98 50 (98, 50)
3 Ross MBA 50 98 (50, 98)
更新
要获得每一行的唯一索引,可以使用df.reset_index
my_new_dataframe.reset_index(drop=True, inplace=True)
name degree score1 score2 combine
0 Tom MBA 90 10 (90, 10)
1 Tom MBA 90 10 (10, 90)
2 John BCA 10 90 (40, 20)
3 John BCA 10 90 (20, 40)
4 Philip M.Tech 40 20 (80, 30)
5 Philip M.Tech 40 20 (30, 80)
6 Ross MBA 20 40 (98, 50)
7 Ross MBA 20 40 (50, 98)
刚刚意识到已经4天了,但我想到了一个简单的解决方案,在与原始数据帧连接之前,先交换2个分数列,例如:
print(my_dataframe)
name degree score1 score2
0 Tom MBA 90 10
1 John BCA 40 20
2 Philip M.Tech 80 30
3 Ross MBA 98 50
my_dataframe2 = my_dataframe.copy()
my_dataframe2.columns = ['name', 'degree', 'score2', 'score1']
print(my_dataframe2)
name degree score2 score1 #note 2 column names are swapped
0 Tom MBA 90 10
1 John BCA 40 20
2 Philip M.Tech 80 30
3 Ross MBA 98 50
接下来使用pd.concat()
,它将根据列名垂直堆叠:
df = pd.concat([my_dataframe, my_dataframe2]).sort_index().reset_index(drop=True)
print(df)
degree name score1 score2
0 MBA Tom 90 10
1 MBA Tom 10 90
2 BCA John 40 20
3 BCA John 20 40
4 M.Tech Philip 80 30
5 M.Tech Philip 30 80
6 MBA Ross 98 50
7 MBA Ross 50 98