reverse/spand/spread和pwrite的典型应用程序用途是什么



如果不耐烦,请跳到下面的"QUESTION"标题。

上下文

我从事Unix(类似)系统管理和基础设施开发工作,但我认为程序员能最好地回答我的问题:o)

我想做的是学习如何使用iozone对文件系统(普通、卷管理、虚拟化、加密等)进行基准测试。作为一项练习,我在一个USB pendrive上做了基准测试,这个U盘本应用作我的slug中的系统磁盘(http://www.nslu2-linux.org/)分别用vfat、ntfs、ext3、ext4和xfs格式化。测试产生了一些令人惊讶的结果,这些结果发布在下面。然而,结果让我惊讶的原因很可能是因为我还是iozone的新手,不知道如何解释这些数字。因此,这篇帖子。

在我的测试中,iozone对11种不同的文件操作运行了基准测试,但只对一个记录大小(4k,与所有测试文件系统的块大小相匹配)和一个文件大小(512MB)运行。当然,文件系统记录大小和文件大小的片面性使测试存在一些偏差。无论如何,下面列出了文件操作,每个操作都有我自己的简短解释:

  • 初始写入:按顺序将新数据写入磁盘,常规文件使用
  • rewrite:将新数据按顺序追加到现有数据中,常规文件使用
  • 读取:按顺序读取数据,常规文件使用
  • 重新读取:按顺序重新读取数据(缓冲区测试,还是什么?)
  • 反向阅读:
  • 跨步阅读:
  • 随机读取:非顺序读取,通常是数据库使用
  • 随机写入:非顺序写入,通常使用数据库
  • pread:读取某个位置的数据-用于索引数据库
  • pwrite:在某个位置写入数据-用于索引数据库
  • 混合工作量:(明显)

其中一些操作看起来很直接。我想最初的写入、重写和读取都用于常规的文件处理,包括让指针搜索直到到达某个块,按顺序读取或写入(通常通过多个块),有时由于文件碎片而不得不向前跳一点。重新读取测试的唯一目标(我想)是缓冲区测试。并行地,随机读/写是典型的数据库操作,指针必须在收集数据库记录的同一文件中从一个位置跳到另一个位置,例如在连接表时。

那么问题是什么呢?

到目前为止,一切都很好。我非常感谢对上述假设的任何更正,尽管它们似乎是众所周知的。现在是真正的问题:你为什么要反向阅读?什么是跨步阅读?有人告诉我,"位置"操作pread和pwrite用于索引数据库,但为什么不简单地将索引保存在内存中呢?或者这就是实际发生的情况,一旦给定某个索引,spread就可以方便地跳到记录的确切位置?你还用pread/pwrite做什么?

总之,到目前为止,我觉得我只能对我的iozone结果进行一半的解释。我或多或少知道为什么随机操作的高数字会成为数据库的一个好文件系统,但为什么我需要按相反的顺序读取文件,一个好的跨步读取能告诉我什么?这些操作的典型应用程序用途是什么?

奖金问题

问过这些之后,这里有一个额外的问题。作为一名特定文件系统的管理员,我感激地从有洞察力的程序员那里学会了如何解释我的文件系统基准;)-有人对如何分析文件系统的实际使用有什么建议吗?对文件系统记录(块)大小进行实验虽然很耗时,但很琐碎。关于给定文件系统中文件的大小和分布,"find"是我的朋友。但是,我该怎么做才能统计实际的文件系统调用,如read()、pwrite()等。?

此外,我也非常感谢任何关于其他资源对文件系统测试结果的影响的评论,例如处理器功率、RAM容量和速度的作用。例如,当我想在装有266 MHz ARM英特尔XScale处理器和32/8 MB SD/闪存的内存块中使用pendrive时,我在一台装有1.66Ghz Atom处理器和2吉比特DDR2 RAM的机器上进行测试有什么区别?

有架构意识的文档?

由于我不喜欢重复太多,我也不喜欢问别人,所以,如果这些问题不能以简短的方式得到回答,我将非常感谢与进一步文档的链接,重要的不是它解释了上述文件操作的实际功能(我可以在API中查找),而是这个文档具有架构意识,它解释了这些操作在现实生活中通常是如何使用的。

测试结果

对。我承诺了我相当谦虚的USB pendrive文件系统测试的结果。我的主要期望通常是写的结果很差(考虑到闪存驱动器的性质,它的块大小通常比管理它的实际文件系统大,这意味着要写一个小的更改,必须重写相对大量的未更改数据),读的结果很好。主要观点是:

  • vfat在所有操作中都做得很好,除了有些模糊的(无论如何,对我来说)反向和跨步读取。我想功能的缺乏消除了很多记账。

  • ntfs在重写(append)和读取操作方面非常糟糕,这使得它不适合常规文件操作。它对pread操作也很糟糕,使其成为索引数据库的糟糕候选者。

  • 令人惊讶的是,ext3和ext4,后者在所有操作上都要好得多,在初始写入、重写、读取、随机写入和pwrite操作方面表现不佳,这使得它们不适合常规文件使用,也不适合高度更新的数据库。不过,ext4是一个随机读取和pread的大师,使它成为静态数据库的优秀候选者(?)。无论这意味着什么,ext3和ext4在模糊的反向读取和跨步读取操作上都得分很高。

  • xfs是测试中最出色的赢家,其唯一的弱点似乎是反向阅读。在初始写入、重写、读取、随机写入和pwrite方面,它是最好的,这使它成为常规文件使用和(高度更新的)数据库的优秀候选者。在重读时,随机阅读和传播它是亚军之一,这使它成为(有点静态的)数据库的一个很好的候选者。无论这意味着什么,它也能很好地进行跨步阅读!

欢迎对这些结果的解释发表任何评论!下面列出了数字(由于长度原因有所删减),一个iozone测试套件pr.文件系统类型,所有测试都在标准4GB Verbatim pendrive(橙色;)上进行,对接在三星N105P笔记本电脑中,该笔记本电脑配有N450 1.66Ghz Atom CPU和2GB DDR2 667Mhz RAM,运行一个带有加密交换的Linux 3.2.0-24 x86内核(是的,我知道,我应该安装一个64位Linux,并清除交换!)。

谨致问候,Torsten

PS。写完这篇文章后,我发现Debian NSLU2发行版显然不支持xfs。不过,我的问题仍然存在!

---vfat---

Iozone: Performance Test of File I/O
Version $Revision: 3.397 $
Compiled for 32 bit mode.
Build: linux 
Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
Al Slater, Scott Rhine, Mike Wisner, Ken Goss
Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner,
Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Dave Boone,
Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root,
Fabrice Bacchella, Zhenghua Xue, Qin Li, Darren Sawyer.
Ben England.
Run began: Mon Jun  4 14:23:57 2012
Record Size 4 KB
File size set to 524288 KB
Command line used: iozone -l 1 -u 1 -r 4k -s 512m -F /mnt/iozone.tmp
Output is in Kbytes/sec
Time Resolution = 0.000002 seconds.
Processor cache size set to 1024 Kbytes.
Processor cache line size set to 32 bytes.
File stride size set to 17 * record size.
Min process = 1 
Max process = 1 
Throughput test with 1 process
Each process writes a 524288 Kbyte file in 4 Kbyte records
Children see throughput for  1 initial writers  =   12864.82 KB/sec
Parent sees throughput for  1 initial writers   =    3033.39 KB/sec
Children see throughput for  1 rewriters    =   25271.86 KB/sec
Parent sees throughput for  1 rewriters     =    2876.36 KB/sec
Children see throughput for  1 readers      =  685333.00 KB/sec
Parent sees throughput for  1 readers       =  682464.06 KB/sec
Children see throughput for 1 re-readers    =  727929.94 KB/sec
Parent sees throughput for 1 re-readers     =  726612.47 KB/sec
Children see throughput for 1 reverse readers   =  458174.00 KB/sec
Parent sees throughput for 1 reverse readers    =  456910.21 KB/sec
Children see throughput for 1 stride readers    =  351768.00 KB/sec
Parent sees throughput for 1 stride readers     =  351504.09 KB/sec
Children see throughput for 1 random readers    =  553705.94 KB/sec
Parent sees throughput for 1 random readers     =  552630.83 KB/sec
Children see throughput for 1 mixed workload    =  549812.50 KB/sec
Parent sees throughput for 1 mixed workload     =  547645.03 KB/sec
Children see throughput for 1 random writers    =   19958.66 KB/sec
Parent sees throughput for 1 random writers     =    2752.23 KB/sec
Children see throughput for 1 pwrite writers    =   13355.57 KB/sec
Parent sees throughput for 1 pwrite writers     =    3119.04 KB/sec
Children see throughput for 1 pread readers     =  574273.31 KB/sec
Parent sees throughput for 1 pread readers  =  572121.97 KB/sec

---ntfs---

Iozone: Performance Test of File I/O
Version $Revision: 3.397 $
Compiled for 32 bit mode.
Build: linux 
Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
Al Slater, Scott Rhine, Mike Wisner, Ken Goss
Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner,
Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Dave Boone,
Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root,
Fabrice Bacchella, Zhenghua Xue, Qin Li, Darren Sawyer.
Ben England.
Run began: Mon Jun  4 13:59:37 2012
Record Size 4 KB
File size set to 524288 KB
Command line used: iozone -l 1 -u 1 -r 4k -s 512m -F /mnt/iozone.tmp
Output is in Kbytes/sec
Time Resolution = 0.000002 seconds.
Processor cache size set to 1024 Kbytes.
Processor cache line size set to 32 bytes.
File stride size set to 17 * record size.
Min process = 1 
Max process = 1 
Throughput test with 1 process
Each process writes a 524288 Kbyte file in 4 Kbyte records
Children see throughput for  1 initial writers  =   11153.75 KB/sec
Parent sees throughput for  1 initial writers   =    2848.69 KB/sec
Children see throughput for  1 rewriters    =    8723.95 KB/sec
Parent sees throughput for  1 rewriters     =    2794.81 KB/sec
Children see throughput for  1 readers      =   24935.60 KB/sec
Parent sees throughput for  1 readers       =   24878.74 KB/sec
Children see throughput for 1 re-readers    =  144415.05 KB/sec
Parent sees throughput for 1 re-readers     =  144340.90 KB/sec
Children see throughput for 1 reverse readers   =   76627.60 KB/sec
Parent sees throughput for 1 reverse readers    =   76362.93 KB/sec
Children see throughput for 1 stride readers    =  367293.25 KB/sec
Parent sees throughput for 1 stride readers     =  366002.25 KB/sec
Children see throughput for 1 random readers    =  505843.41 KB/sec
Parent sees throughput for 1 random readers     =  500556.16 KB/sec
Children see throughput for 1 mixed workload    =  553075.56 KB/sec
Parent sees throughput for 1 mixed workload     =  551754.97 KB/sec
Children see throughput for 1 random writers    =    9747.23 KB/sec
Parent sees throughput for 1 random writers     =    2381.89 KB/sec
Children see throughput for 1 pwrite writers    =   10906.05 KB/sec
Parent sees throughput for 1 pwrite writers     =    1931.43 KB/sec
Children see throughput for 1 pread readers     =   16730.47 KB/sec
Parent sees throughput for 1 pread readers  =   16194.80 KB/sec

---ext3-

Iozone: Performance Test of File I/O
Version $Revision: 3.397 $
Compiled for 32 bit mode.
Build: linux 
Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
Al Slater, Scott Rhine, Mike Wisner, Ken Goss
Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner,
Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Dave Boone,
Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root,
Fabrice Bacchella, Zhenghua Xue, Qin Li, Darren Sawyer.
Ben England.
Run began: Sun Jun  3 16:05:27 2012
Record Size 4 KB
File size set to 524288 KB
Command line used: iozone -l 1 -u 1 -r 4k -s 512m -F /media/verbatim/1/iozone.tmp
Output is in Kbytes/sec
Time Resolution = 0.000001 seconds.
Processor cache size set to 1024 Kbytes.
Processor cache line size set to 32 bytes.
File stride size set to 17 * record size.
Min process = 1 
Max process = 1 
Throughput test with 1 process
Each process writes a 524288 Kbyte file in 4 Kbyte records
Children see throughput for  1 initial writers  =    3704.61 KB/sec
Parent sees throughput for  1 initial writers   =    3238.73 KB/sec
Children see throughput for  1 rewriters    =    3693.52 KB/sec
Parent sees throughput for  1 rewriters     =    3291.40 KB/sec
Children see throughput for  1 readers      =  103318.38 KB/sec
Parent sees throughput for  1 readers       =  103210.16 KB/sec
Children see throughput for 1 re-readers    =  908090.88 KB/sec
Parent sees throughput for 1 re-readers     =  906356.05 KB/sec
Children see throughput for 1 reverse readers   =  744801.38 KB/sec
Parent sees throughput for 1 reverse readers    =  743703.54 KB/sec
Children see throughput for 1 stride readers    =  623353.88 KB/sec
Parent sees throughput for 1 stride readers     =  622295.11 KB/sec
Children see throughput for 1 random readers    =  725649.06 KB/sec
Parent sees throughput for 1 random readers     =  723891.82 KB/sec
Children see throughput for 1 mixed workload    =  734631.44 KB/sec
Parent sees throughput for 1 mixed workload     =  733283.36 KB/sec
Children see throughput for 1 random writers    =     177.59 KB/sec
Parent sees throughput for 1 random writers     =     137.83 KB/sec
Children see throughput for 1 pwrite writers    =    2319.47 KB/sec
Parent sees throughput for 1 pwrite writers     =    2200.95 KB/sec
Children see throughput for 1 pread readers     =   13614.82 KB/sec
Parent sees throughput for 1 pread readers  =   13614.45 KB/sec

---ext4-

Iozone: Performance Test of File I/O
Version $Revision: 3.397 $
Compiled for 32 bit mode.
Build: linux 
Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
Al Slater, Scott Rhine, Mike Wisner, Ken Goss
Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner,
Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Dave Boone,
Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root,
Fabrice Bacchella, Zhenghua Xue, Qin Li, Darren Sawyer.
Ben England.
Run began: Sun Jun  3 17:59:26 2012
Record Size 4 KB
File size set to 524288 KB
Command line used: iozone -l 1 -u 1 -r 4k -s 512m -F /media/verbatim/2/iozone.tmp
Output is in Kbytes/sec
Time Resolution = 0.000005 seconds.
Processor cache size set to 1024 Kbytes.
Processor cache line size set to 32 bytes.
File stride size set to 17 * record size.
Min process = 1 
Max process = 1 
Throughput test with 1 process
Each process writes a 524288 Kbyte file in 4 Kbyte records
Children see throughput for  1 initial writers  =    4086.64 KB/sec
Parent sees throughput for  1 initial writers   =    3533.34 KB/sec
Children see throughput for  1 rewriters    =    4039.37 KB/sec
Parent sees throughput for  1 rewriters     =    3409.48 KB/sec
Children see throughput for  1 readers      = 1073806.38 KB/sec
Parent sees throughput for  1 readers       = 1062541.84 KB/sec
Children see throughput for 1 re-readers    =  991162.00 KB/sec
Parent sees throughput for 1 re-readers     =  988426.34 KB/sec
Children see throughput for 1 reverse readers   =  811973.62 KB/sec
Parent sees throughput for 1 reverse readers    =  810333.28 KB/sec
Children see throughput for 1 stride readers    =  779127.19 KB/sec
Parent sees throughput for 1 stride readers     =  777359.89 KB/sec
Children see throughput for 1 random readers    =  796860.56 KB/sec
Parent sees throughput for 1 random readers     =  795138.41 KB/sec
Children see throughput for 1 mixed workload    =  741489.56 KB/sec
Parent sees throughput for 1 mixed workload     =  739544.09 KB/sec
Children see throughput for 1 random writers    =     499.05 KB/sec
Parent sees throughput for 1 random writers     =     399.82 KB/sec
Children see throughput for 1 pwrite writers    =    4092.66 KB/sec
Parent sees throughput for 1 pwrite writers     =    3451.62 KB/sec
Children see throughput for 1 pread readers     =  840101.38 KB/sec
Parent sees throughput for 1 pread readers  =  831083.31 KB/sec

---xfs-

Iozone: Performance Test of File I/O
Version $Revision: 3.397 $
Compiled for 32 bit mode.
Build: linux 
Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
Al Slater, Scott Rhine, Mike Wisner, Ken Goss
Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner,
Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Dave Boone,
Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root,
Fabrice Bacchella, Zhenghua Xue, Qin Li, Darren Sawyer.
Ben England.
Run began: Mon Jun  4 14:47:49 2012
Record Size 4 KB
File size set to 524288 KB
Command line used: iozone -l 1 -u 1 -r 4k -s 512m -F /mnt/iozone.tmp
Output is in Kbytes/sec
Time Resolution = 0.000005 seconds.
Processor cache size set to 1024 Kbytes.
Processor cache line size set to 32 bytes.
File stride size set to 17 * record size.
Min process = 1 
Max process = 1 
Throughput test with 1 process
Each process writes a 524288 Kbyte file in 4 Kbyte records
Children see throughput for  1 initial writers  =   21854.47 KB/sec
Parent sees throughput for  1 initial writers   =    3836.32 KB/sec
Children see throughput for  1 rewriters    =   29420.40 KB/sec
Parent sees throughput for  1 rewriters     =    3955.65 KB/sec
Children see throughput for  1 readers      =  624136.75 KB/sec
Parent sees throughput for  1 readers       =  614326.13 KB/sec
Children see throughput for 1 re-readers    =  577542.62 KB/sec
Parent sees throughput for 1 re-readers     =  576533.42 KB/sec
Children see throughput for 1 reverse readers   =  483368.06 KB/sec
Parent sees throughput for 1 reverse readers    =  482598.67 KB/sec
Children see throughput for 1 stride readers    =  537227.12 KB/sec
Parent sees throughput for 1 stride readers     =  536313.77 KB/sec
Children see throughput for 1 random readers    =  525219.19 KB/sec
Parent sees throughput for 1 random readers     =  524062.07 KB/sec
Children see throughput for 1 mixed workload    =  561513.50 KB/sec
Parent sees throughput for 1 mixed workload     =  560142.18 KB/sec
Children see throughput for 1 random writers    =   24118.34 KB/sec
Parent sees throughput for 1 random writers     =    3117.71 KB/sec
Children see throughput for 1 pwrite writers    =   32512.07 KB/sec
Parent sees throughput for 1 pwrite writers     =    3825.54 KB/sec
Children see throughput for 1 pread readers     =  525244.94 KB/sec
Parent sees throughput for 1 pread readers  =  523331.93 KB/sec

我唯一需要深入研究文件系统性能的时候是在windows系统上。无论您使用什么操作系统/文件系统,一般原则都适用。。。

你为什么要反向阅读

当程序运行时,它读取块987654,然后使用该数据确定它需要块123456。这可能发生在联接上:您的Db可能正在使用表1上的索引从表2中挑选记录(使用索引)。领料操作可能按表1的顺序进行(与表2的顺序相反)。

当使用两个键时,单表选择也可能发生类似的情况。

什么是跨步阅读

读取每个第N个区块,例如读取区块12345600,然后读取区块12345700,然后区块12345800是100的步长。想象一下,一个有很多列和/或大列的表。该表中的行可能需要几个文件系统块来保存数据。通常,数据库会将这些数据组织成每行的一个记录,每个记录占用几个顺序的文件系统块。如果您的数据库行占用了10个文件系统块,并且您在两列上进行选择,那么您可能只需要读取该10个块记录的第一个和第六个块。然后,您的查询需要读取块10001、10006、10011、10016、10021、10026——步长为5。

有人告诉我,"position"操作spread和pwrite用于索引数据库,但为什么不简单地将索引保存在内存中呢

索引的大小可能超过合理的RAM使用量。或者,您以前的用法将其他索引或数据调用到ram中,导致未使用的索引从文件系统/db缓存中删除。

或者这就是实际发生的情况,一旦给定某个索引,pread就可以方便地跳到记录的确切位置是的,这可能就是你的数据库正在做的事情。

您还用pread/pwrite做什么

一些数据文件具有预定义的"感兴趣"位置。这可能是B-Tree索引的根、表头、日志/日志尾部或其他内容,具体取决于Db实现。sprad/rwrite测试的是重复跳到一组特定位置的性能,而不是位置的均匀随机混合。

链接

所有主流操作系统都有系统实用程序,可以捕获每个操作系统文件系统操作。我认为在*NIX系统上,这些可能被命名为DTRACE、pTAP或pTRACE。您可以使用这些监视器中堆积如山的数据(智能过滤)来查看系统中的磁盘访问模式。

然后一般的经验法则是,对于Db的使用淫秽的RAM是有帮助的。然后,所有索引都一直驻留在RAM中。

抱歉:我无法添加有关您询问的特定系统调用的信息。所以我添加了一些固执己见的内容,而不是。。。

在我看来,iozone不是一个非常有趣的基准测试工具。我认为,分析各种系统调用也没那么有趣。

重要的是文件系统在现实世界中的工作方式。然而,与现实世界场景进行基准测试可能非常耗时;例如,创建一个有效的测试环境可能需要很长时间。这就是为什么基准测试工具确实派上了用场。但是基准测试工具应该能够以尽可能接近实际应用程序的方式工作;此外,如果基准测试工具以一种残酷的方式工作,这样就可以探索相关硬件/软件的极限,这通常是很好的。

满足这些要求的两个基准测试工具是fio和Oracle的orion。使用这两种工具,可以相对容易地指定一个将使用合理的读写混合的基准,并指定基准应该如何并行运行。并且可以同时执行设备级和FS级基准测试;这很好,因为有时您希望在没有特定文件系统开销的情况下对存储设备进行基准测试。与Orion相比,fio具有动态邮件列表的优势,在该列表中,获得好答案的可能性非常高(我还没有找到Orion的邮件列表)。

我可以为您的问题的两个部分提供一些背景知识。"反向读取"测试是在观察一些机械工程应用程序的I/O行为后引入的。这些应用程序通常会按顺序从磁盘向前读取,然后再向后读取。有人猜测这与(线性代数)前向和后向替换有关,或者与依赖磁带驱动器的原始实现有关。

至于跨步访问,这是许多地震勘探应用(深度和/或时间偏移IIRC)的常见I/O模式。与"反向读取"场景一样,这也是在观察这些应用程序的I/O行为后引入的。

相关内容

  • 没有找到相关文章

最新更新