根据原始文件的一个元素写入多个文件

我需要根据chr将一个包含基因组中所有chr坐标的bed格式的文件读取到不同的文件中。我尝试过这种方法，但它不起作用，它不会创建任何文件。你知道为什么会发生这种情况吗？或者解决这个问题的其他方法吗？

import sys
def make_out_file(dir_path, chr_name, extension):
    file_name = dir_path + "/" + chr_name + extension
    out_file = open(file_name, "w")
    out_file.close()
    return file_name
def append_output_file(line, out_file):
    with open(out_file, "a") as f:
        f.write(line)
    f.close()
in_name = sys.argv[1]
dir_path = sys.argv[2]
with open(in_name, "r") as in_file:
    file_content = in_file.readlines()
    chr_dict = {}
    out_file_dict = {}
    line_count = 0
    for line in file_content[:0]:
        line_count += 1
        elems = line.split("t")
        chr_name = elems[0]
        chr_dict[chr_name] += 1
        if chr_dict.get(chr_name) = 1:
            out_file = make_out_file(dir_path, chr_name, ".bed")
            out_file_dict[chr_name] = out_file
            append_output_file(line, out_file)
        elif chr_dict.get(chr_name) > 1:
            out_file = out_file_dict.get(chr_name)
            append_output_file(line, out_file)
        else:
            print "There's been an Error"

in_file.close()

此行：

for line in file_content[:0]:

说在一个空列表上迭代。空列表来自切片[:0]，它表示从列表的开头到第一个元素之前进行切片。这里有一个演示：

>>> l = ['line 1n', 'line 2n', 'line 3n']
>>> l[:0]
[]
>>> l[:1]
['line 1n']

因为列表是空的，所以不会进行迭代，所以不会执行for循环主体中的代码。

要迭代文件的每一行，您不需要切片：

for line in file_content:

然而，最好还是迭代文件对象，因为这不需要首先将整个文件读入内存：

with open(in_name, "r") as in_file:    
    chr_dict = {}
    out_file_dict = {}
    line_count = 0
    for line in in_file:
        ...

接下来，可以开始调试的for循环中的代码会出现许多问题，包括语法错误。

相关内容

最新更新

热门标签：