r语言 - 我如何填写一个矩阵(块)使用while循环?



我正试图在一个大数据集的块中读取:找到每个块的平均值(代表一个较大的列)将平均值添加到矩阵列中然后求均值的均值,得到这一列的总均值。我已经设置好了,但是我的while循环没有重复它的循环。我想这可能与我所说的"大块"有关。和"chunk".

这是使用"iris.csv"在R

fl <- file("iris.csv", "r")
clname <- readLines(fl, n=1) # read the header
r <- unlist(strsplit(clname,split = ","))
length(r) # get the number of columns in the matrix
cm <- matrix(NA, nrow=1000, ncol=length(r)) # need a matrix that can be filled on each #iteration.
numchunk = 0 #set my chunks of code to build up
while(numchunk <= 0){ #stop when no more chunks left to run
numchunk <- numchunk + 1 # keep on moving through chunks of code
x <- readLines(fl, n=100) #read 100 lines at a time
chunk <- as.numeric(unlist(strsplit(x,split = ","))) # readable chunk of code
m <- matrix(chunk, ncol=length(r), byrow = TRUE) # put chunk in a matrix
cm[numchunk,] <- colMeans(m) #get the column means of the matrix and fill in larger matrix
print(numchunk) # print the number of chunks used
}
cm
close(fl)
final_mean <- colSums(cm)/nrow(cm)
return(final_mean)

——当我设置n = 1000时,这可以工作,但我希望它适用于更大的数据集,其中while需要继续运行。有人能帮我纠正这个吗?

也许这有帮助

clname <- readLines(fl, n=1) # read the header
r <- unlist(strsplit(clname,split = ","))
length(r) # get the number of columns in the matrix
cm <- matrix(NA, nrow=1000, ncol=length(r)) # 
numchunk = 0 
flag <- TRUE
while(flag){ 
numchunk <- numchunk + 1 # keep on moving through chunks of code
x <- readLines(fl, n=5) 
print(length(x))
if(length(x) == 0) {
flag <- FALSE
} else {



chunk <- as.numeric(unlist(strsplit(x,split = ","))) # readable chunk of code
m <- matrix(chunk, ncol=length(r), byrow = TRUE) # put chunk in a matrix
cm[numchunk,] <- colMeans(m) #get the column means of the matrix and fill in larger matrix
print(numchunk) # print the number of chunks used
}

}
cm
close(fl)
final_mean <- colSums(cm)/nrow(cm)

首先,定义一个辅助函数r2v()将原始行拆分为有用的向量可能会有所帮助。

r2v <- Vectorize((x) {
## splits raw lines to vectors
strsplit(gsub('\"', '', x), split=",")[[1]][-1]
})

打开文件后,使用system()和bash命令(Windows见这里)检查需要读入的文件的大小

## open file
f <- 'iris.csv'
fl <- file(f, "r")
## rows
(nr <- 
as.integer(gsub(paste0('\s', f), '', system(paste('wc -l', f), int=T))) - 1)
# nr <- 150  ## alternatively define nrows manually
# [1] 150
## columns
nm <- readLines(fl, n=1) |> r2v()
(nc <- length(nm))
# [1] 5

接下来,定义一个块大小,通过它可以划分行。

## define chunk size
ch_sz <- 50
stopifnot(nr %% ch_sz == 0)  ## all chunks should be filled

然后,使用replicate(),我们计算块方向的rowMeans()(因为我们将块转置),最后再次对所有内容进行rowMeans()以获得整个矩阵的列均值。

## calculate means chunk-wise
final_mean <-
replicate(nr / ch_sz, 
rowMeans(type.convert(r2v(readLines(fl, n=ch_sz)), as.is=TRUE))) |>
rowMeans()
close(fl)

让我们验证一下结果。

## test
all.equal(final_mean, as.numeric(colMeans(iris[-5])))
# [1] TRUE

数据:

iris[-5] |>
write.csv('iris.csv')

相关内容

  • 没有找到相关文章

最新更新