了解使用 While vs For 循环进行文件处理

我正在尝试使用多种方法读取一个大文件。所以我的代码应该查找并计算文本"time_data"在文件中出现的次数。总数实际上是 24。但是使用我拥有的代码，它只使用 while 循环找到其中的 4 个计数：

with open(filename) as f:
time_data_count = 0
while True:
memcap = f.read(102400)
if 'TIME_DATA' in memcap:
time_data_count += 1
if not memcap:
break
if time_data_count > 20:
print("time_data complete")
else:
print("incomplete time_data data")

为什么time_data_count只返回计数 4？它应该搜索所有 memcap，并在每次找到"time_data"时看起来递增。当我像这样使用 for 循环时，我没有这个问题：

with open(filename, 'r', buffering=102400) as f:
time_data_count = 0
for line in f:
if 'TIME_DATA' in line:
time_data_count += 1
if time_data_count > 20:
print("time_data complete")
else:
print("incomplete time_data data")

我错过了什么？是的，文件确实有换行符

使用time_data_count += memcap.count('TIME_DATA')计算memcap字符串中'TIME_DATA_的出现次数。如果项目在文件块之间被切成两半，则不会计算项目。

with open(filename) as f:
time_data_count = 0
while True:
memcap = f.read(102400)
if 'TIME_DATA' in memcap:
# add the number of occurrences
time_data_count += memcap.count('TIME_DATA')
if not memcap:
break
if time_data_count > 20:
print("time_data complete")
else:
print("incomplete time_data data")

原始代码的问题是，无论出现多少次，计数器只会在字符串中找到该项时才递增，而不是每次出现。

它很可能与你的 if not 语句有关，因为这是唯一会破坏 while 循环的东西。因为它是 True，理论上它会永远运行，除非被打破。由于只有一行可以打破 while 循环，因此它必须在那里。如您所知，while循环需要有一个断点，我个人会这样做：

byt= 0 #count the number of bytes (that is your 102400)
while True:
memcap = f.read(102400)
if 'TIME_DATA' in memcap:
time_data_count += 1
byt += 1 #iterate so that you count up to that value
elif byt == 102400: #total number of bytes in the file
break

这将运行，直到读取文件的每个字节，然后它将中断 while 循环。它与您正在做的事情相似，只是略有不同，如果您想坚持使用 while 循环，应该可以工作。

相关内容

最新更新

热门标签：