由于"looping styles"不同而做出的不同行为

我有一个简单的问题。导航到某一行，然后删除之后的所有内容。我使用了合适的file.truncate()调用。但是，下面这两段代码的行为不同。

with open(file, "a+b", 1) as f:
  #Navigate to the MARKER
  while True:
    line = f.readline()
    if MARKER in line:
      f.truncate()
      f.write(stuff)
      break

with open(file, "a+b", 1) as f:
  #Navigate to the MARKER
  for line in f:
    if MARKER in line:
      f.truncate()
      f.write(stuff)
      break

(1)的行为符合预期。然而，在(2)的情况下，在 MARKER之后发现截断了几行的文件。我推测有一些缓冲正在进行，但正如您所看到的，我显式地将缓冲行为定义为对open()调用的"行缓冲"。

任何想法吗?我想使用更直观的"for line in file"语法…

线索似乎在Python的C源代码中——Python 2.7似乎为for line in file:使用了一个8 KB的预读缓冲区。

来自Python文档，5。内置类型/5.9。文件对象:

为了使for循环最多有效的循环方法文件的行数(非常常见)操作)，next()方法使用隐藏预读缓冲区。

BTW:通常不鼓励使用关键字(例如file)作为变量名。

一般来说，for x in y类型的语句希望y在循环中不会改变。你违反了合同。

这是因为'a'模式:

打开追加(在末尾写入)的文件)。如果满足以下条件，则创建该文件不存在。流是位于文件的末尾。

+

为阅读和追加打开(在文件末尾写入)。文件是如果不存在则创建。的读取的初始文件位置为在文件的开头，但是输出总是附加在文件的末尾。
http://linux.die.net/man/3/fopen

编辑

我的上述答案是错误的。

我已经知道循环行一个文件使用预读的缓冲区,但我认为截断()触发文件的指针会移动到文件的末尾,因为,据我所知,删除一个文件包含在写一个字节序列称为EOF文件尾,最后'a'模式总会激起写文件的任何文件的指针的位置之前写的时刻。

不是这样的，我应该通过执行代码来验证。所以我的回答值得否决。

但是在没有任何解释的情况下投票是卑鄙和令人沮丧的，在这种情况下，这个答案中的错误并不明显。

下面的代码显示，在truncate()操作之前，文件指针没有移动到文件末尾。

为了澄清，文件'fileA'由每行长度为100个字符('rn组成)的行组成，以这样的方式结束('rn'在这里不可见):

....ffffffgggggggggghhhhhhhhhhiiiiiiiiii00000100
....ffffffgggggggggghhhhhhhhhhiiiiiiiiii00000200
....ffffffgggggggggghhhhhhhhhhiiiiiiiiii00000300
....ffffffgggggggggghhhhhhhhhhiiiiiiiiii00000400
....ffffffgggggggggghhhhhhhhhhiiiiiiiiii00000500
....ffffffgggggggggghhhhhhhhhhiiiiiiiiii00000600
....ffffffgggggggggghhhhhhhhhhiiiiiiiiii00000700
....ffffffgggggggggghhhhhhhhhhiiiiiiiiii00000800
....ffffffgggggggggghhhhhhhhhhiiiiiiiiii00000900
....ffffffgggggggggghhhhhhhhhhiiiiiiiiii00001000
....ffffffgggggggggghhhhhhhhhhiiiiiiiiii00001100
....ffffffgggggggggghhhhhhhhhhiiiiiiiiii00001200
....ffffffgggggggggghhhhhhhhhhiiiiiiiiii00001300
............................

代码:

print 'n===================== 1 ==================n'
from os.path import getsize
# length of ecr is 90 :
ecr = 10*'a' + 10*'b' + 10*'c' + 10*'d' + 10*'e' +
      10*'f' + 10*'g' + 10*'h' + 10*'i'

# creation of a file whose length exceeds the reading buffer's size
with open('fileA.txt','wb') as f:
    for i in xrange(100,53001,100): # 530 turns of iteration
        f.write(ecr + str(i).zfill(8) + 'rn')
        # Length of each written line is 100 :
        # 90 (ecr) + 8 (str(i).zfill(8)) + 2 ('rn')
        # File's length will be 53000

print 'size of fileA before truncating : ',getsize('fileA.txt')
# truncating file at uncontroled position
with open('fileA.txt','a+b') as g:
    for line in g:
        if '00000800' in line:
            print repr(line[78:]),'  g.tell()==',g.tell()
            # at this point, 800 characters should have been read
            # in the file if there wasn't a reading buffer
            g.truncate()
print 'size of fileA after truncating : ',getsize('fileA.txt')

结果

===================== 1 ==================
size of fileA before truncating :  53000
'hhiiiiiiiiii00000800rn'   g.tell()== 8192
size of fileA after truncating :  8192

所以，AKX和Fenisko在调用缓冲区时是正确的(然而他们没有比我更多地测试这个假设)，因为在'a'模式下打开文件对truncate()的动作没有影响。我认为这是以下摘录中大写句子的意思:

file.truncate([size]文件的大小。如果可选尺寸参数存在时，文件为截断到(最多)这个大小。的大小默认为当前位置。当前文件位置不是改变了
http://docs.python.org/library/stdtypes.html file.truncate

直到现在，我才理解这句话。

如AKX指出的，缓冲区的大小为8192 ....第一次阅读

但是对于下一个读取，缓冲区显然是10240字符长:

print 'n=================== 2 ====================n'
from os.path import getsize
# length of ecr is 90 :
ecr = 10*'a' + 10*'b' + 10*'c' + 10*'d' + 10*'e' +
      10*'f' + 10*'g' + 10*'h' + 10*'i'

# creation of a file whose length exceeds the reading buffer's size
with open('fileA.txt','wb') as f:
    for i in xrange(100,53001,100): # 530 turns of iteration
        f.write(ecr + str(i).zfill(8) + 'rn')
        # length of each written line is 100
        # file's length will be 53000

print 'size of fileA before truncating : ',getsize('fileA.txt')
# truncating file at uncontroled position
with open('fileA.txt','a+b') as g:
    for line in g:
        if '00008100' in line:
            print repr(line[78:]),'  g.tell()==',g.tell()
            # at this point, 800 characters should have been read
            # in the file if there wasn't a reading buffer
            g.truncate()
print 'size of fileA after truncating : ',getsize('fileA.txt')
# -----------
print
# creation of a file whose length exceeds the reading buffer's size
with open('fileA.txt','wb') as f:
    for i in xrange(100,53001,100): # 530 turns of iteration
        f.write(ecr + str(i).zfill(8) + 'rn')
        # length of each written line is 100
        # file's length will be 53000

print 'size of fileA before truncating : ',getsize('fileA.txt')
# truncating file at uncontroled position
with open('fileA.txt','a+b') as g:
    for line in g:
        if '00008200' in line:
            print repr(line[78:]),'  g.tell()==',g.tell()
            # at this point, 800 characters should have been read
            # in the file if there wasn't a reading buffer
            g.truncate()
print 'size of fileA after truncating : ',getsize('fileA.txt')
# -----------
print
# creation of a file whose length exceeds the reading buffer's size
with open('fileA.txt','wb') as f:
    for i in xrange(100,53001,100): # 530 turns of iteration
        f.write(ecr + str(i).zfill(8) + 'rn')
        # length of each written line is 100
        # file's length will be 53000

print 'size of fileA before truncating : ',getsize('fileA.txt')
# truncating file at uncontroled position
with open('fileA.txt','a+b') as g:
    for line in g:
        if '00018400' in line:
            print repr(line[78:]),'  g.tell()==',g.tell()
            # at this point, 800 characters should have been read
            # in the file if there wasn't a reading buffer
            g.truncate()
print 'size of fileA after truncating : ',getsize('fileA.txt')
# -----------
print
# creation of a file whose length exceeds the reading buffer's size
with open('fileA.txt','wb') as f:
    for i in xrange(100,53001,100): # 530 turns of iteration
        f.write(ecr + str(i).zfill(8) + 'rn')
        # length of each written line is 100
        # file's length will be 53000

print 'size of fileA before truncating : ',getsize('fileA.txt')
# truncating file at uncontroled position
with open('fileA.txt','a+b') as g:
    for line in g:
        if '00018500' in line:
            print repr(line[78:]),'  g.tell()==',g.tell()
            # at this point, 800 characters should have been read
            # in the file if there wasn't a reading buffer
            g.truncate()
print 'size of fileA after truncating : ',getsize('fileA.txt')

结果

=================== 2 ====================
size of fileA before truncating :  53000
'hhiiiiiiiiii00008100rn'   g.tell()== 8192
size of fileA after truncating :  8192
size of fileA before truncating :  53000
'hhiiiiiiiiii00008200rn'   g.tell()== 18432
size of fileA after truncating :  18432
size of fileA before truncating :  53000
'hhiiiiiiiiii00018400rn'   g.tell()== 18432
size of fileA after truncating :  18432
size of fileA before truncating :  53000
'hhiiiiiiiiii00018500rn'   g.tell()== 28672
size of fileA after truncating :  28672

顺便说一下，truncate()不会关闭文件:

print 'n=================== 3 ====================n'
from os.path import getsize
# length of ecr is 90 :
ecr = 10*'a' + 10*'b' + 10*'c' + 10*'d' + 10*'e' +
      10*'f' + 10*'g' + 10*'h' + 10*'i'

# creation of a file whose length exceeds the reading buffer's size
with open('fileA.txt','wb') as f:
    for i in xrange(100,53001,100): # 530 turns of iteration
        f.write(ecr + str(i).zfill(8) + 'rn')
        # length of each written line is 100
        # file's length will be 53000

print 'size of fileA before truncating : ',getsize('fileA.txt')
with open('fileA.txt','a+b') as g:
    for line in g:
        if '00000200' in line:
            print repr(line[78:]),'  g.tell()==',g.tell()
            # at this point, 800 characters should have been read
            # if there wasn't a buffer
            g.truncate()
    g.seek(6000,0)
    k = 0
    for li in g:
        k+=1
        print 'k==',k,'   ',repr(li[-32:])
        if k==7:
            break
print 'size of fileA after truncating : ',getsize('fileA.txt')

结果

=================== 3 ====================
size of fileA before truncating :  53000
'hhiiiiiiiiii00000200rn'   g.tell()== 8192
k== 1     'gghhhhhhhhhhiiiiiiiiii00006100rn'
k== 2     'gghhhhhhhhhhiiiiiiiiii00006200rn'
k== 3     'gghhhhhhhhhhiiiiiiiiii00006300rn'
k== 4     'gghhhhhhhhhhiiiiiiiiii00006400rn'
k== 5     'gghhhhhhhhhhiiiiiiiiii00006500rn'
k== 6     'gghhhhhhhhhhiiiiiiiiii00006600rn'
k== 7     'gghhhhhhhhhhiiiiiiiiii00006700rn'
size of fileA after truncating :  8192

但是如果写指令正好放在truncate()之后，程序的行为就变得不连贯了。试试。

编辑

相关内容

最新更新

热门标签：