在R中,图像处理循环在大约50次迭代后需要更长数量级的时间来处理



EDIT:为了清晰起见,将图像名称从Image1-Image11更改为Image50-Image60。

EDIT2:通过在每次循环迭代中删除图像文件后添加垃圾收集命令来解决。代码已更新。

我在一个文件夹中有400多个jpeg图像。我正在尝试编写一个脚本:读取每个图像,识别图像中的一些文本,然后将文件名和文本写入数据帧。

当我运行下面的脚本时,前50次迭代打印的时间为.1-.3秒。然后,对于几次迭代,迭代将花费1-3秒。然后,这会增加到1-5分钟,之后我会终止脚本。

library(dplyr)
library(magick)
fileList3 = list.files(path = filePath)
printJobXRes = data.frame(
jobName = as.character(),
xRes = as.numeric(),
stringsAsFactors = FALSE
)
i = 0
for (fileName in fileList3){
img = paste0(filePath, '/', fileName, '_TestImage.jpg')
start_time = Sys.time()
temp.xRes = image_read(img, strip = T) %>% 
image_rotate(270) %>% 
image_crop('90x150+1750') %>% 
image_negate %>%
image_convert(type = 'Bilevel') %>%
image_ocr %>%
as.numeric
stop_time = Sys.time()
i = i+1
print(paste(fileName,'first attempt, item #', i))
print(stop_time-start_time)
temp.df3 = data.frame(
jobName = fileName,
xRes = temp.xRes,
stringsAsFactors = FALSE
)
printJobXRes = rbind(printJobXRes, temp.df3)
rm(temp.xRes)
rm(temp.df3)
rm(img)
gc() #This solved the issue
}

这里有几行输出:

#Images 1-49 process in .1-.3 seconds each
[1] "Image50.jpg first attempt, item # 50"
Time difference of 0.2320111 secs
[1] "Image51.jpg first attempt, item # 51"
Time difference of 0.213742 secs
[1] "Image52.jpg first attempt, item # 52"
Time difference of 0.2536581 secs
[1] "Image53.jpg first attempt, item # 53"
Time difference of 1.253844 secs
[1] "Image54.jpg first attempt, item # 54"
Time difference of 1.149764 secs
[1] "Image55.jpg first attempt, item # 55"
Time difference of 1.171134 secs
[1] "Image56.jpg first attempt, item # 56"
Time difference of 1.397093 secs
[1] "Image57.jpg first attempt, item # 57"
Time difference of 1.201915 secs
[1] "Image58.jpg first attempt, item # 58"
Time difference of 1.455768 secs
[1] "Image59.jpg first attempt, item # 59"
Time difference of 1.618744 secs
[1] "Image60.jpg first attempt, item # 60" 
Time difference of 4.527751 mins

有人能提出建议,解释为什么循环不继续花费~1.1-.3秒吗?所有jpg的大小和分辨率大致相同,并且都是从同一个源生成的。

我能够根据Mark的建议解决我的问题。在每次循环迭代中,我都会从内存中删除图像文件,但R从未实现释放的内存。我在循环中添加了一个垃圾收集命令(gc(((来解决这个问题,然后循环按预期运行。

最新更新