我收到许多格式如下的文本文件:
100000054896524Textext
30000680235498065464065 texttext
50005065321465406546406 16227322
7000056432586846403546854065354096
50046540632146540665406 16268431
7000066543241564786413468464163156
30065406346840654065486 TEXTETXT
我需要根据这些行的第一个字符将这些文件的内容写入新文件,这样每个第一个字符就有n个文件。对于上面的数据,我有四个新文件:
file1.txt
:
100000054896524Textext
file3.txt
:
30000680235498065464065 texttext
30065406346840654065486 TEXTETXT
file5.txt
:
50005065321465406546406 16227322
50046540632146540665406 16268431
和file7.txt
:
7000056432586846403546854065354096
7000066543241564786413468464163156
我似乎不知道该怎么做。我试过以下几种:
with open('test_file.txt','r') as file_handle:
file_content = file_handle.read()
with open('file1.txt', 'w') as file_handle:
for line in file_content:
if line[0] == '1':
file_handle.write(line+'n')
with open('file3.txt', 'w') as file_handle:
for line in file_content:
if line[0] == '3':
file_handle.write(line+'n')
5和7等等,但这只会让我得到一堆1和3的文件,而没有数据。。。
我不明白的是什么?非常感谢。
使用readlines()
而不是read()
(第2行(
使用file_handle.read()
而不是file_handle.readlines()
将返回一个字符串,因此使用file_handle.read()
将逐个字符迭代。
使用readlines()
将逐行迭代,因为该函数将返回一个列表。
与其为每个文件单独调用open
,不如使用字典。下面是一个工作示例:
output = {}
with open('testfile.txt') as f:
for line in f:
start_char = line[0]
if start_char not in output:
output[start_char] = []
output[start_char].append(line)
for start_char in output.keys():
with open('file{}.txt'.format(start_char), 'w') as f:
f.writelines(output[start_char])
read
将文件作为单个字符串读取。迭代时,是逐字符迭代,而不是逐行迭代。您可以使用file_content = file_handle.readlines()
来迭代行而不是字符。
不要为每个文件复制代码,而是设置一个缓存,让脚本动态创建文件。
# will hold open file objects for "file0.txt", ..., "file9.txt"
# as needed
file_cache = [None] * 10
try:
with open('test_file.txt') as file_handle:
for line in file_handle:
num = int(line[0])
if file_cache[num] is None:
file_cache[num] = open(f"file{num}.txt", "w")
file_cache[num].write(line)
# todo: May want to catch exceptions and delete all files on fail
# except:...
finally:
for fp in file_cache:
if fp:
fp.close()
您可以在读取输入文件时根据需要动态打开正确的文件:
open_files = {}
with open('test_file.txt','r') as file_handle:
for line in file_handle:
digit = line[0]
fname = f'file{digit}.txt'
if fname in open_files:
write_file = open_files[fname]
else:
open_files[fname] = open(fname, 'w')
write_file = open_files[fname]
write_file.write(line)
for write_file in open_files.values():
write_file.close()