我需要根据chr将一个包含基因组中所有chr坐标的bed格式的文件读取到不同的文件中。我尝试过这种方法,但它不起作用,它不会创建任何文件。你知道为什么会发生这种情况吗?或者解决这个问题的其他方法吗?
import sys
def make_out_file(dir_path, chr_name, extension):
file_name = dir_path + "/" + chr_name + extension
out_file = open(file_name, "w")
out_file.close()
return file_name
def append_output_file(line, out_file):
with open(out_file, "a") as f:
f.write(line)
f.close()
in_name = sys.argv[1]
dir_path = sys.argv[2]
with open(in_name, "r") as in_file:
file_content = in_file.readlines()
chr_dict = {}
out_file_dict = {}
line_count = 0
for line in file_content[:0]:
line_count += 1
elems = line.split("t")
chr_name = elems[0]
chr_dict[chr_name] += 1
if chr_dict.get(chr_name) = 1:
out_file = make_out_file(dir_path, chr_name, ".bed")
out_file_dict[chr_name] = out_file
append_output_file(line, out_file)
elif chr_dict.get(chr_name) > 1:
out_file = out_file_dict.get(chr_name)
append_output_file(line, out_file)
else:
print "There's been an Error"
in_file.close()
此行:
for line in file_content[:0]:
说在一个空列表上迭代。空列表来自切片[:0]
,它表示从列表的开头到第一个元素之前进行切片。这里有一个演示:
>>> l = ['line 1n', 'line 2n', 'line 3n']
>>> l[:0]
[]
>>> l[:1]
['line 1n']
因为列表是空的,所以不会进行迭代,所以不会执行for循环主体中的代码。
要迭代文件的每一行,您不需要切片:
for line in file_content:
然而,最好还是迭代文件对象,因为这不需要首先将整个文件读入内存:
with open(in_name, "r") as in_file:
chr_dict = {}
out_file_dict = {}
line_count = 0
for line in in_file:
...
接下来,可以开始调试的for循环中的代码会出现许多问题,包括语法错误。