通过交换两列来复制Pandas数据帧行



给定pandas数据帧,数据帧中的每一行都应该是重复的,但重复的行与原始行的score1score2列不同,因此它们的值被交换。此外,必须保持唯一ID的增量特性。

原始数据帧:

nme = ["Tom", "John", "Philip", "Ross"]
deg = ["MBA", "BCA", "M.Tech", "MBA"]
scr1 = [90, 40, 80, 98]
scr2 = [10, 20, 30, 50]
# dictionary of lists
dct = {'name': nme, 'degree': deg, 'score1': scr1, 'score2': scr2}
my_dataframe = pd.DataFrame(dct)
print(my_dataframe)
name  degree  score1  score2
0     Tom     MBA      90      10
1    John     BCA      40      20
2  Philip  M.Tech      80      30
3    Ross     MBA      98      50

预期数据帧:

nme = ["Tom","Tom", "John", "John", "Philip","Philip", "Ross", "Ross"]
deg = ["MBA", "MBA", "BCA", "BCA", "M.Tech", "M.Tech", "MBA", "MBA"]
scr1 = [90, 10, 40, 20, 80, 30, 98, 50]
scr2 = [10, 90, 20, 40, 30, 80, 50, 98]
# dictionary of lists
dct = {'name': nme, 'degree': deg, 'score1': scr1, 'score2': scr2}
my_new_dataframe = pd.DataFrame(dct)
print(my_new_dataframe)
name  degree  score1  score2
0     Tom     MBA      90      10
1     Tom     MBA      10      90
2    John     BCA      40      20
3    John     BCA      20      40
4  Philip  M.Tech      80      30
5  Philip  M.Tech      30      80
6    Ross     MBA      98      50
7    Ross     MBA      50      98

事先非常感谢。

这个对我有用:

new_df = pd.DataFrame(columns=['name', 'degree', 'score1', 'score2'])
j=0
for index, row in df.iterrows():
new_df.loc[j] = [row[0], row[1], row[2], row[3]]
new_df.loc[j+1] = [row[0], row[1], row[3], row[2]]
j+=2
new_df

输出:

name     degree score1  score2
0   Tom     MBA     90  10
1   Tom     MBA     10  90
2   John    BCA     40  20
3   John    BCA     20  40
4   Philip  M.Tech  80  30
5   Philip  M.Tech  30  80
6   Ross    MBA     98  50
7   Ross    MBA     50  98

您可以使用itertools.permutationsdf.explode来获得所需的输出。

from itertools import permutations
my_dataframe['combine'] = [list(permutations((i,j),2)) for i,j in zip(my_dataframe.score1,my_dataframe.score2)]
name  degree  score1  score2               combine
0     Tom     MBA      90      10  [(90, 10), (10, 90)]
1    John     BCA      40      20  [(40, 20), (20, 40)]
2  Philip  M.Tech      80      30  [(80, 30), (30, 80)]
3    Ross     MBA      98      50  [(98, 50), (50, 98)]
my_new_dataframe = my_dataframe.explode('combine') 
name  degree  score1  score2   combine
0     Tom     MBA      90      10  (90, 10)
0     Tom     MBA      90      10  (10, 90)
1    John     BCA      40      20  (40, 20)
1    John     BCA      40      20  (20, 40)
2  Philip  M.Tech      80      30  (80, 30)
2  Philip  M.Tech      80      30  (30, 80)
3    Ross     MBA      98      50  (98, 50)
3    Ross     MBA      98      50  (50, 98)
my_new_dataframe[['score1','score2']] = pd.DataFrame(my_new_dataframe['combine'].tolist(), index= my_new_dataframe.index)
name  degree  score1  score2   combine
0     Tom     MBA      90      10  (90, 10)
0     Tom     MBA      10      90  (10, 90)
1    John     BCA      40      20  (40, 20)
1    John     BCA      20      40  (20, 40)
2  Philip  M.Tech      80      30  (80, 30)
2  Philip  M.Tech      30      80  (30, 80)
3    Ross     MBA      98      50  (98, 50)
3    Ross     MBA      50      98  (50, 98)

更新

要获得每一行的唯一索引,可以使用df.reset_index

my_new_dataframe.reset_index(drop=True, inplace=True)
name  degree  score1  score2   combine
0     Tom     MBA      90      10  (90, 10)
1     Tom     MBA      90      10  (10, 90)
2    John     BCA      10      90  (40, 20)
3    John     BCA      10      90  (20, 40)
4  Philip  M.Tech      40      20  (80, 30)
5  Philip  M.Tech      40      20  (30, 80)
6    Ross     MBA      20      40  (98, 50)
7    Ross     MBA      20      40  (50, 98)

刚刚意识到已经4天了,但我想到了一个简单的解决方案,在与原始数据帧连接之前,先交换2个分数列,例如:

print(my_dataframe)
name  degree  score1  score2
0     Tom     MBA      90      10
1    John     BCA      40      20
2  Philip  M.Tech      80      30
3    Ross     MBA      98      50
my_dataframe2 = my_dataframe.copy()
my_dataframe2.columns = ['name', 'degree', 'score2', 'score1']
print(my_dataframe2)
name  degree  score2  score1    #note 2 column names are swapped
0     Tom     MBA      90      10
1    John     BCA      40      20
2  Philip  M.Tech      80      30
3    Ross     MBA      98      50

接下来使用pd.concat(),它将根据列名垂直堆叠:

df = pd.concat([my_dataframe, my_dataframe2]).sort_index().reset_index(drop=True)
print(df)
degree    name  score1  score2
0     MBA     Tom      90      10
1     MBA     Tom      10      90
2     BCA    John      40      20
3     BCA    John      20      40
4  M.Tech  Philip      80      30
5  M.Tech  Philip      30      80
6     MBA    Ross      98      50
7     MBA    Ross      50      98

最新更新