尝试在 R 中将 dgCMatrix 另存为 csv 时"Problem too large"



我正试图将从同事那里收到的包含稀疏矩阵(dgCMatrix(的RDS文件转换为纯文本CSV文件。我意识到这个文件将有很多GB大,无需警告我。我尝试过使用as.matrix,但我得到了一个";问题太大";错误我该如何避免这种情况?

> write.csv(as.matrix(x), 'table.csv')
Loading required package: Matrix
Error in asMethod(object) : 
Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 105

为什么不分块处理稀疏矩阵?下面的代码是这样做的一种方式。

library(Matrix)
write_sparse_csv <- function(x, file, ..., chunk = 100){
passes <- nrow(x) %/% chunk
remaining <- nrow(x) %% chunk
if(passes > 0){
inx <- seq_len(chunk)
y <- x[inx, , drop = FALSE]
y <- as.matrix(y)
write.table(y, file, append = FALSE, sep = ",", col.names = !is.null(colnames(x)), ...)
passes <- passes - 1L
for(i in seq_len(passes)){
inx <- inx + chunk
y <- x[inx, , drop = FALSE]
y <- as.matrix(y)
write.table(y, file, append = TRUE, sep = ",", col.names = FALSE,  ...)
}
if(remaining > 0){
inx <- inx + remaining
y <- x[inx, , drop = FALSE]
y <- as.matrix(y)
write.table(y, file, append = TRUE, sep = ",", col.names = FALSE, ...)
}
} else if(remaining > 0){
inx <- seq_len(remaining)
y <- x[inx, , drop = FALSE]
y <- as.matrix(y)
write.table(y, file, append = FALSE, sep = ",", col.names = FALSE, ...)
}
}
set.seed(2021)
n <- 1e6
M <- Matrix(sample(c(rep(0, 9*n/10), seq_len(n/10))), ncol = 5e2, sparse = TRUE)
dim(M)
write_sparse_csv(M, "~/tmp/test.csv")

最新更新