我有一个数据帧,看起来像这样:
df1 = pd.DataFrame({'Gene':['TP53', 'COX5', 'P16'], 'test':[1,3,0], 'Healthy':[0,0,2]})
Gene test Healthy
0 TP53 1 0
1 COX5 3 0
2 P16 0 2
我一直在尝试创建所有可能值的排列。这个想法是绘制第一个基因"tp53"。它的值在column "test"并记录"健康"的值。列。
例如,最初TP53将映射到自身:TP53: TP53:1:0然后将TP53映射到健康栏中的COX5: TP53:COX5:1:0接着是下一个基因:TP53:P16:1:2接下来,基因COX5将使用"测试"中的值进行定位。列,以便与"健康状态"进行比较。专栏:COX5: TP53:3:0然后:COX5: COX5:3:0
所以最终会产生如下表格:
All_combinations
TP53:TP53:1:0
TP53:COX5:1:0
TP53:P16:1:2
COX5:TP53:3:0
COX5:COX5:3:0
COX5:P16:3:2
P16:TP53:0:0
P16:COX5:0:0
P16:P16:0:2
我已经尝试了下面的代码,但有困难。
import pandas as pd
df1 = pd.DataFrame({'Gene':['TP53', 'COX5', 'P16'], 'test':[1,3,0], 'Healthy':[0,0,2]})
df2 = df1.transpose()
df2.columns = df2.iloc[0]
df2 = df2.iloc[1:]
from itertools import product
uniques = [df1[i].unique().tolist() for i in df1.iloc[:,[1,2]]]
pd.DataFrame(product(*uniques), columns = df2.iloc[:,])
真实的数据集有超过32,000行,所以快速工作的东西将是伟大的。谢谢你的帮助
这段代码能解决你的问题吗?
import pandas as pd
df1 = pd.DataFrame({'Gene':['TP53', 'COX5', 'P16'], 'test':[1,3,0], 'Healthy':[0,0,2]})
# Create all the combinations as tuples.
# Note that test is taken from gene1 but Healthy from gene2
# The enumerate is used to get the row number related to that gene
row_list = []
for i, gene1 in enumerate(df1.Gene):
for j, gene2 in enumerate(df1.Gene):
row_list.append((gene1, gene2, df1.iloc[i].test, df1.iloc[j].Healthy))
# Now create a new dataframe with the results
df2 = pd.DataFrame(row_list, columns=['Gene1', 'Gene2', 'test', 'Healthy'])
这产生:
Gene1 Gene2 test Healthy
0 TP53 TP53 1 0
1 TP53 COX5 1 0
2 TP53 P16 1 2
3 COX5 TP53 3 0
4 COX5 COX5 3 0
5 COX5 P16 3 2
6 P16 TP53 0 0
7 P16 COX5 0 0
8 P16 P16 0 2
因为已经给出了一个pandas
解。只是展示product
是如何工作的
a=[1,3,0]
b=[0,0,2]
from itertools import product
list(product(*[a]+[b]))
[(1, 0), (1, 0), (1, 2), (3, 0), (3, 0), (3, 2), (0, 0), (0, 0), (0, 2)]