我有以下索引:
Index(['z_2', 'z_3', # first location, the numbers after the _ are redundant
'z_4', 'z_5',
'z_6', 'z_7',
'z_8', 'z_9',
'z_10', 'z_11',
...
'r_4509', 'r_4538',
'r_4306', 'r_4583',
'r_1232', 'r_4592',
'r_4601', 'r_4610',
'r_4637', 'r_4627'],
dtype='object', length=45590)
此外,以下csr矩阵:
(1, 29163) 0.06 # <- z location 0, r location 29163
(3, 29161) 0.14 # <- z location 3, r location 29161
(4, 29160) 0.11
(5, 29159) 0.21
.... ...
(44336, 18216) 0.01
(44396, 15050) 0.02
(44440, 16356) 0.24
(44461, 531) 0.04
注:csr_matrix中的数字实际上是位置。z_数据具有大约24k行,而r_数据具有21k行。因此,第一行"csr_matrix"实际上指示z_2(这是索引中的第一个值(和r_data中的第29163个值。我正试图找到一种方法,用索引中的相应字符串替换位置号。附近的数字是分数。我尝试了什么:编辑部分答案
import anndata as ad
from scipy.sparse import csr_matrix
var_names = pd.DataFrame(matrix.adata.var_names)
col1 = pd.DataFrame(matrix.tgf.tocoo().row ) # 29163, 29161,29160..
col2 = pd.DataFrame(matrix.tgf.tocoo().col ) # 1,3,4
# index to location intersection
z = var_names.index.intersection(col1.iloc[:,0])
r = var_names.index.intersection(col2.iloc[:,0])
var_names.loc[z] = var_names
var_names['z'] = col1.iloc[:,0]
var_names.loc[r] = var_names
var_names['r'] = col2.iloc[:,0]
col1['var'] = z
col2['var'] = r
#var_names['col1'] = col1.iloc[0]
#var_names['col2'] = col2.iloc[0]
print(var_names[:5])
0 z r
0 z_2 1 29163
1 z_3 3 29161
2 z_4 4 29160
3 z_5 5 29159
4 z_6 6 41784
但后来我很挣扎,因为我对csr_matrix 没有任何经验
*edited:r列与r_不对应,尽管我与col2进行了交集。
var_names = pd.DataFrame(matrix.adata.var_names, columns = ['features'])
#This is the most important part to extract both "columns"- like from matrix
col1 = pd.DataFrame(matrix.tgf.tocoo().row, columns = ['features'])
col2 = pd.DataFrame(matrix.tgf.tocoo().col, columns = ['features'])
#Intersection between each data (z_ or r_) with the matrix (var_names)
col2['features'] = col2['features'].map(var_names.set_index(var_names.index)['features'])
col1['features'] = col1['features'].map(var_names.set_index(var_names.index)['features'])
col1 = col1.loc[col1.index.isin(col2.index)]
col1['r']= col2['features']
print(col1)
#通过索引位置匹配
z_3 r_15654
1 z_5 r_25472
2 z_6 r_15412
3 z_7 r_15468
4 z_8 r_12