我有一个简单的问题。导航到某一行,然后删除之后的所有内容。我使用了合适的file.truncate()调用。但是,下面这两段代码的行为不同。
1)
with open(file, "a+b", 1) as f:
#Navigate to the MARKER
while True:
line = f.readline()
if MARKER in line:
f.truncate()
f.write(stuff)
break
2)
with open(file, "a+b", 1) as f:
#Navigate to the MARKER
for line in f:
if MARKER in line:
f.truncate()
f.write(stuff)
break
(1)的行为符合预期。然而,在(2)的情况下,在 MARKER之后发现截断了几行的文件。我推测有一些缓冲正在进行,但正如您所看到的,我显式地将缓冲行为定义为对open()调用的"行缓冲"。
任何想法吗?我想使用更直观的"for line in file"语法…
线索似乎在Python的C源代码中——Python 2.7似乎为for line in file:
使用了一个8 KB的预读缓冲区。
来自Python文档,5。内置类型/5.9。文件对象:
为了使for循环最多有效的循环方法文件的行数(非常常见)操作),next()方法使用隐藏预读缓冲区。
BTW:通常不鼓励使用关键字(例如file
)作为变量名。
一般来说,for x in y
类型的语句希望y在循环中不会改变。你违反了合同。
这是因为'a'
模式:
打开追加(在末尾写入)的文件)。如果满足以下条件,则创建该文件不存在。流是位于文件的末尾。
+
为阅读和追加打开(在文件末尾写入)。文件是如果不存在则创建。的读取的初始文件位置为在文件的开头,但是输出总是附加在文件的末尾。
http://linux.die.net/man/3/fopen
.
编辑
我的上述答案是错误的。
我已经知道循环行一个文件使用预读的缓冲区,但我认为截断()触发文件的指针会移动到文件的末尾,因为,据我所知,删除一个文件包含在写一个字节序列称为EOF文件尾,最后'a'
模式总会激起写文件的任何文件的指针的位置之前写的时刻。
不是这样的,我应该通过执行代码来验证。所以我的回答值得否决。
但是在没有任何解释的情况下投票是卑鄙和令人沮丧的,在这种情况下,这个答案中的错误并不明显。
.
下面的代码显示,在truncate()操作之前,文件指针没有移动到文件末尾。
为了澄清,文件'fileA'由每行长度为100个字符('rn组成)的行组成,以这样的方式结束('rn'在这里不可见):
....ffffffgggggggggghhhhhhhhhhiiiiiiiiii00000100
....ffffffgggggggggghhhhhhhhhhiiiiiiiiii00000200
....ffffffgggggggggghhhhhhhhhhiiiiiiiiii00000300
....ffffffgggggggggghhhhhhhhhhiiiiiiiiii00000400
....ffffffgggggggggghhhhhhhhhhiiiiiiiiii00000500
....ffffffgggggggggghhhhhhhhhhiiiiiiiiii00000600
....ffffffgggggggggghhhhhhhhhhiiiiiiiiii00000700
....ffffffgggggggggghhhhhhhhhhiiiiiiiiii00000800
....ffffffgggggggggghhhhhhhhhhiiiiiiiiii00000900
....ffffffgggggggggghhhhhhhhhhiiiiiiiiii00001000
....ffffffgggggggggghhhhhhhhhhiiiiiiiiii00001100
....ffffffgggggggggghhhhhhhhhhiiiiiiiiii00001200
....ffffffgggggggggghhhhhhhhhhiiiiiiiiii00001300
............................
代码:
print 'n===================== 1 ==================n'
from os.path import getsize
# length of ecr is 90 :
ecr = 10*'a' + 10*'b' + 10*'c' + 10*'d' + 10*'e' +
10*'f' + 10*'g' + 10*'h' + 10*'i'
# creation of a file whose length exceeds the reading buffer's size
with open('fileA.txt','wb') as f:
for i in xrange(100,53001,100): # 530 turns of iteration
f.write(ecr + str(i).zfill(8) + 'rn')
# Length of each written line is 100 :
# 90 (ecr) + 8 (str(i).zfill(8)) + 2 ('rn')
# File's length will be 53000
print 'size of fileA before truncating : ',getsize('fileA.txt')
# truncating file at uncontroled position
with open('fileA.txt','a+b') as g:
for line in g:
if '00000800' in line:
print repr(line[78:]),' g.tell()==',g.tell()
# at this point, 800 characters should have been read
# in the file if there wasn't a reading buffer
g.truncate()
print 'size of fileA after truncating : ',getsize('fileA.txt')
结果===================== 1 ==================
size of fileA before truncating : 53000
'hhiiiiiiiiii00000800rn' g.tell()== 8192
size of fileA after truncating : 8192
.
所以,AKX和Fenisko在调用缓冲区时是正确的(然而他们没有比我更多地测试这个假设),因为在'a'
模式下打开文件对truncate()的动作没有影响。我认为这是以下摘录中大写句子的意思:
file.truncate([size]文件的大小。如果可选尺寸参数存在时,文件为截断到(最多)这个大小。的大小默认为当前位置。当前文件位置不是改变了
http://docs.python.org/library/stdtypes.html file.truncate
直到现在,我才理解这句话。
.
如AKX指出的,缓冲区的大小为8192 ....第一次阅读
但是对于下一个读取,缓冲区显然是10240字符长:
print 'n=================== 2 ====================n'
from os.path import getsize
# length of ecr is 90 :
ecr = 10*'a' + 10*'b' + 10*'c' + 10*'d' + 10*'e' +
10*'f' + 10*'g' + 10*'h' + 10*'i'
# creation of a file whose length exceeds the reading buffer's size
with open('fileA.txt','wb') as f:
for i in xrange(100,53001,100): # 530 turns of iteration
f.write(ecr + str(i).zfill(8) + 'rn')
# length of each written line is 100
# file's length will be 53000
print 'size of fileA before truncating : ',getsize('fileA.txt')
# truncating file at uncontroled position
with open('fileA.txt','a+b') as g:
for line in g:
if '00008100' in line:
print repr(line[78:]),' g.tell()==',g.tell()
# at this point, 800 characters should have been read
# in the file if there wasn't a reading buffer
g.truncate()
print 'size of fileA after truncating : ',getsize('fileA.txt')
# -----------
print
# creation of a file whose length exceeds the reading buffer's size
with open('fileA.txt','wb') as f:
for i in xrange(100,53001,100): # 530 turns of iteration
f.write(ecr + str(i).zfill(8) + 'rn')
# length of each written line is 100
# file's length will be 53000
print 'size of fileA before truncating : ',getsize('fileA.txt')
# truncating file at uncontroled position
with open('fileA.txt','a+b') as g:
for line in g:
if '00008200' in line:
print repr(line[78:]),' g.tell()==',g.tell()
# at this point, 800 characters should have been read
# in the file if there wasn't a reading buffer
g.truncate()
print 'size of fileA after truncating : ',getsize('fileA.txt')
# -----------
print
# creation of a file whose length exceeds the reading buffer's size
with open('fileA.txt','wb') as f:
for i in xrange(100,53001,100): # 530 turns of iteration
f.write(ecr + str(i).zfill(8) + 'rn')
# length of each written line is 100
# file's length will be 53000
print 'size of fileA before truncating : ',getsize('fileA.txt')
# truncating file at uncontroled position
with open('fileA.txt','a+b') as g:
for line in g:
if '00018400' in line:
print repr(line[78:]),' g.tell()==',g.tell()
# at this point, 800 characters should have been read
# in the file if there wasn't a reading buffer
g.truncate()
print 'size of fileA after truncating : ',getsize('fileA.txt')
# -----------
print
# creation of a file whose length exceeds the reading buffer's size
with open('fileA.txt','wb') as f:
for i in xrange(100,53001,100): # 530 turns of iteration
f.write(ecr + str(i).zfill(8) + 'rn')
# length of each written line is 100
# file's length will be 53000
print 'size of fileA before truncating : ',getsize('fileA.txt')
# truncating file at uncontroled position
with open('fileA.txt','a+b') as g:
for line in g:
if '00018500' in line:
print repr(line[78:]),' g.tell()==',g.tell()
# at this point, 800 characters should have been read
# in the file if there wasn't a reading buffer
g.truncate()
print 'size of fileA after truncating : ',getsize('fileA.txt')
结果=================== 2 ====================
size of fileA before truncating : 53000
'hhiiiiiiiiii00008100rn' g.tell()== 8192
size of fileA after truncating : 8192
size of fileA before truncating : 53000
'hhiiiiiiiiii00008200rn' g.tell()== 18432
size of fileA after truncating : 18432
size of fileA before truncating : 53000
'hhiiiiiiiiii00018400rn' g.tell()== 18432
size of fileA after truncating : 18432
size of fileA before truncating : 53000
'hhiiiiiiiiii00018500rn' g.tell()== 28672
size of fileA after truncating : 28672
.
顺便说一下,truncate()不会关闭文件:print 'n=================== 3 ====================n'
from os.path import getsize
# length of ecr is 90 :
ecr = 10*'a' + 10*'b' + 10*'c' + 10*'d' + 10*'e' +
10*'f' + 10*'g' + 10*'h' + 10*'i'
# creation of a file whose length exceeds the reading buffer's size
with open('fileA.txt','wb') as f:
for i in xrange(100,53001,100): # 530 turns of iteration
f.write(ecr + str(i).zfill(8) + 'rn')
# length of each written line is 100
# file's length will be 53000
print 'size of fileA before truncating : ',getsize('fileA.txt')
with open('fileA.txt','a+b') as g:
for line in g:
if '00000200' in line:
print repr(line[78:]),' g.tell()==',g.tell()
# at this point, 800 characters should have been read
# if there wasn't a buffer
g.truncate()
g.seek(6000,0)
k = 0
for li in g:
k+=1
print 'k==',k,' ',repr(li[-32:])
if k==7:
break
print 'size of fileA after truncating : ',getsize('fileA.txt')
结果=================== 3 ====================
size of fileA before truncating : 53000
'hhiiiiiiiiii00000200rn' g.tell()== 8192
k== 1 'gghhhhhhhhhhiiiiiiiiii00006100rn'
k== 2 'gghhhhhhhhhhiiiiiiiiii00006200rn'
k== 3 'gghhhhhhhhhhiiiiiiiiii00006300rn'
k== 4 'gghhhhhhhhhhiiiiiiiiii00006400rn'
k== 5 'gghhhhhhhhhhiiiiiiiiii00006500rn'
k== 6 'gghhhhhhhhhhiiiiiiiiii00006600rn'
k== 7 'gghhhhhhhhhhiiiiiiiiii00006700rn'
size of fileA after truncating : 8192
但是如果写指令正好放在truncate()之后,程序的行为就变得不连贯了。试试。