csr矩阵和索引之间的交叉



我有以下索引:

Index(['z_2', 'z_3',     # first location, the numbers after the _ are redundant
'z_4', 'z_5',
'z_6', 'z_7',
'z_8', 'z_9',
'z_10', 'z_11',
...
'r_4509', 'r_4538',
'r_4306', 'r_4583',
'r_1232', 'r_4592',
'r_4601', 'r_4610',
'r_4637', 'r_4627'],
dtype='object', length=45590)

此外,以下csr矩阵:

(1, 29163)    0.06            # <- z location 0, r location 29163
(3, 29161)    0.14            # <- z location 3, r location 29161
(4, 29160)    0.11
(5, 29159)    0.21
....           ...
(44336, 18216)    0.01
(44396, 15050)    0.02
(44440, 16356)    0.24
(44461, 531)      0.04

注:csr_matrix中的数字实际上是位置。z_数据具有大约24k行,而r_数据具有21k行。因此,第一行"csr_matrix"实际上指示z_2(这是索引中的第一个值(和r_data中的第29163个值。我正试图找到一种方法,用索引中的相应字符串替换位置号。附近的数字是分数。我尝试了什么:编辑部分答案

import anndata as ad
from scipy.sparse import csr_matrix
var_names = pd.DataFrame(matrix.adata.var_names)
col1 = pd.DataFrame(matrix.tgf.tocoo().row ) # 29163, 29161,29160..
col2 = pd.DataFrame(matrix.tgf.tocoo().col ) # 1,3,4
# index to location intersection
z = var_names.index.intersection(col1.iloc[:,0])
r = var_names.index.intersection(col2.iloc[:,0])
var_names.loc[z] = var_names
var_names['z'] = col1.iloc[:,0]
var_names.loc[r] = var_names
var_names['r'] = col2.iloc[:,0]
col1['var'] = z
col2['var'] = r
#var_names['col1'] = col1.iloc[0]
#var_names['col2'] = col2.iloc[0]
print(var_names[:5])
0   z    r
0  z_2  1  29163
1  z_3  3  29161
2  z_4  4  29160
3  z_5  5  29159
4  z_6  6  41784

但后来我很挣扎,因为我对csr_matrix 没有任何经验

*edited:r列与r_不对应,尽管我与col2进行了交集。

var_names = pd.DataFrame(matrix.adata.var_names, columns = ['features'])
#This is the most important part to extract both "columns"- like from matrix
col1 = pd.DataFrame(matrix.tgf.tocoo().row, columns = ['features'])
col2 = pd.DataFrame(matrix.tgf.tocoo().col, columns = ['features'])
#Intersection between each data (z_ or r_) with the matrix (var_names)
col2['features'] = col2['features'].map(var_names.set_index(var_names.index)['features'])
col1['features'] = col1['features'].map(var_names.set_index(var_names.index)['features'])
col1 = col1.loc[col1.index.isin(col2.index)]
col1['r']= col2['features']
print(col1)

#通过索引位置匹配

z_3                    r_15654
1      z_5                    r_25472
2      z_6                    r_15412
3      z_7                    r_15468
4      z_8                    r_12

最新更新