Csr矩阵:如何用np.nan代替0来替换缺失的值



似乎默认情况下csr_matrix0填充缺失值。那么,如何用np.nan填充缺失的值呢?

from scipy.sparse import csr_matrix
row = np.array([0, 0, 1, 2, 2, 2])
col = np.array([0, 2, 2, 0, 1, 2])
data = np.array([0, 2, 3, 4, 5, 6])
csr_matrix((data, (row, col)), shape=(3, 3)).toarray()

输出:

array([[0, 0, 2],
[0, 0, 3],
[4, 5, 6]])

预期:

array([[0, np.nan, 2],
[np.nan, np.nan, 3],
[4, 5, 6]])

这里有一个解决方法:

from scipy.sparse import csr_matrix
row = np.array([0, 0, 1, 2, 2, 2])
col = np.array([0, 2, 2, 0, 1, 2])
data = np.array([0, 2, 3, 4, 5, 6])
mask = csr_matrix(([1]*len(data), (row, col)), shape=(3, 3)).toarray()
mask[mask==0] = np.nan
csr_matrix((data, (row, col)), shape=(3, 3)).toarray() * mask

这在csr_matrix中是不可能的,因为它根据定义存储非零元素。

如果真的需要这些nan,只需处理密集的结果即可。

a=csr_matrix((data, (row, col)), shape=(3, 3)).toarray()
a[a == 0] = np.nan
def todense_fill(coo: sp.coo_matrix, fill_value: float) -> np.ndarray:
"""Densify a sparse COO matrix. Same as coo_matrix.todense()
except it fills missing entries with fill_value instead of 0.
"""
dummy_value = np.nan if not np.isnan(fill_value) else np.inf
dummy_check = np.isnan if np.isnan(dummy_value) else np.isinf
coo = coo.copy().astype(float)
coo.data[coo.data == 0] = dummy_value
out = np.array(coo.todense()).squeeze()
out[out == 0] = fill_value
out[dummy_check(out)] = 0
return out

最新更新