根据原始文件的一个元素写入多个文件



我需要根据chr将一个包含基因组中所有chr坐标的bed格式的文件读取到不同的文件中。我尝试过这种方法,但它不起作用,它不会创建任何文件。你知道为什么会发生这种情况吗?或者解决这个问题的其他方法吗?

import sys
def make_out_file(dir_path, chr_name, extension):
    file_name = dir_path + "/" + chr_name + extension
    out_file = open(file_name, "w")
    out_file.close()
    return file_name
def append_output_file(line, out_file):
    with open(out_file, "a") as f:
        f.write(line)
    f.close()
in_name = sys.argv[1]
dir_path = sys.argv[2]
with open(in_name, "r") as in_file:
    file_content = in_file.readlines()
    chr_dict = {}
    out_file_dict = {}
    line_count = 0
    for line in file_content[:0]:
        line_count += 1
        elems = line.split("t")
        chr_name = elems[0]
        chr_dict[chr_name] += 1
        if chr_dict.get(chr_name) = 1:
            out_file = make_out_file(dir_path, chr_name, ".bed")
            out_file_dict[chr_name] = out_file
            append_output_file(line, out_file)
        elif chr_dict.get(chr_name) > 1:
            out_file = out_file_dict.get(chr_name)
            append_output_file(line, out_file)
        else:
            print "There's been an Error"

in_file.close()

此行:

for line in file_content[:0]:

说在一个空列表上迭代。空列表来自切片[:0],它表示从列表的开头到第一个元素之前进行切片。这里有一个演示:

>>> l = ['line 1n', 'line 2n', 'line 3n']
>>> l[:0]
[]
>>> l[:1]
['line 1n']

因为列表是空的,所以不会进行迭代,所以不会执行for循环主体中的代码。

要迭代文件的每一行,您不需要切片:

for line in file_content:

然而,最好还是迭代文件对象,因为这不需要首先将整个文件读入内存:

with open(in_name, "r") as in_file:    
    chr_dict = {}
    out_file_dict = {}
    line_count = 0
    for line in in_file:
        ...

接下来,可以开始调试的for循环中的代码会出现许多问题,包括语法错误。

相关内容

  • 没有找到相关文章

最新更新