优化读取磁盘Python的图像



我想知道什么更快,更好:

class AvgRGB(object):
    def __init__(self, path):
        self.path = path
        self.imgs = []
        self.avg = MyRGBImg()       
    def gather_pictures(self):
        # for now gathe all the files, next check for picture extensions
        p = self.path
        self.names = [f for f in listdir(p) if isfile(join(p, f))]
        for imgname in self.names:
            path, name, ext = get_pathname(imgname)
            if ext in ['.png', '.jpg']:
                imagepath = join(self.path, imgname)
                img = MyRGBImg(imagepath )
                self.imgs.append(img)
    def average(self):
        dataset = self.imgs
        s = MyRGBImg(np.zeros(dataset[0].data.shape))
        for i, picture in enumerate(dataset):
            im = picture.data
            s += im 
        s = s / float(len(dataset))
        self.avg = MyRGBImg(s)

class AvgRGB_savememory(object):
    def __init__(self, path):
        self.path = path
        self.imgs_names = []
    def get_image(self, index):
        # read the image corresponding to the path
        pathtopic = join(self.path, self.imgs_names[index])
        myimg = MyRGBImg()
        myimg.read_from_file(pathtopic)
        return myimg
    def gather_pictures_names(self):
        p = self.path
        filenames = [f for f in listdir(p) if isfile(join(p, f))]
        for filename in filenames:
            path, name, ext = get_pathname(filename)
            if ext in ['.png', '.jpg']:       
                self.imgs_names.append(filename)
    def average(self, aligned = True, debug = False):
        sizedataset = len(self.imgs_names)
        picture = self.get_image(0)       
        s = MyRGBImg(np.zeros(picture.data.shape)) 
        for i in range(sizedataset):
            #load the picture
            picture = self.get_image(i)
            im = picture.data
            #perform operations
            s += im 
        s = s / float(sizedataset)
        self.avg = MyRGBImg(s)

此代码的snipplet从文件夹中获取图像并平均。

两个snipplet之间的差异是:第一个将图像加载到数组中,而第二个则加载图片从内存中加载。

现在您必须想象这不是我唯一的操作,并且当我尝试分析500张图片(1080x1080x3(的数据时,该程序会出现记忆错误。

我的问题是哪个更好?还是更快?

从理论上讲,第一个应该更快,因为加载了内存中的所有图像,但是当数组的大小大于RAM时会发生什么?他们被写在磁盘上?如果那样的话,那不是比阅读单个图像要慢吗?此外,考虑到我的所有程序都是顺序的,从缓冲区中流式传输图片会更有效?

我没有您的示例数据,但是我会使用一些虚拟功能并进行arg。您可以通过呼叫这样的呼叫找到给定函数呼叫的实际成本:

your_function = lambda x: enumerate(range(x, x**x))
your_arg1 = 8
import cProfile
import pstats
prof = cProfile.Profile()
group = prof.runcall(your_function, your_arg1)
p = pstats.Stats(prof)
p.sort_stats('time').print_stats(100)

这将打印出例如:

   3 function calls in 0.600 seconds
   Ordered by: internal time
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.600    0.600    0.600    0.600 {range}
        1    0.000    0.000    0.600    0.600 python.py:1(<lambda>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

所建议的我进行了一些自制分析

我使用了这种自我实现的时序功能
https://pastebin.com/myph3ndj

这是两个测试的功能AvgFolderAvgFolderMem,第一个功能将所有图像加载到内存中,而第二个则在需要时加载图像。这是
的实施https://github.com/pella86/denoiseaverage/blob/master/src/avgfolder_class.py

这些是总结的结果:https://pastebin.com/pchzfvlv

5图片128x128(灰度(

--------LOAD IN MEM 5---------
Total elapsed time: 00:00:05
305 us/px
--------MEMSAVE 5---------
Total elapsed time: 00:00:06
366 us/px

20图片128x128(灰度(

--------LOAD IN MEM 20---------
Total elapsed time: 00:00:20
1220 us/px
--------MEMSAVE 20 ---------
Total elapsed time: 00:00:20
1220 us/px

100图片128x128(灰度(

--------LOAD IN MEM 100---------
Total elapsed time: 00:01:37
5920 us/px
--------MEMSAVE 100---------
Total elapsed time: 00:01:46
6469 us/px

20图片512x512(灰度(

--------LOAD IN MEM---------
Total elapsed time: 00:27:26
100'463 us/px
--------MEMSAVE---------
Total elapsed time: 00:27:40
101'310 us/px

因此,与教科书概念相反,用numpy从磁盘存储器上下加载文件可能非常有效。我不知道是否是因为图像陷入了分页问题,或者因为我的公羊充满了废话。

最新更新