我必须检查文件是FASTA, FASTQ还是这些都没有。对于FASTA检查,我使用了Bio
中的SeqIO
模块:
def is_fasta(filename):
with open(filename, "r") as handle:
fasta = SeqIO.parse(handle, "fasta")
return any(fasta)
如果文件是FASTA则返回True,如果不是则返回False。但是当我使用这个函数的FASTQ版本时:
def is_fastq(filename):
with open(filename, "r") as handle:
fastq = SeqIO.parse(handle, "fastq")
return any(fastq)
我得到一个错误信息:
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/Bio/SeqIO/Interfaces.py",第74行,在下一个返回下一个(self.records)File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/Bio/SeqIO/QualityIO.py",第1085行,在iterate中用于fastqgeneriterator (handle)中的title_line, seq_string, quality_string:File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/Bio/SeqIO/QualityIO.py",第932行,在FastqGeneralIteratorFastq文件中的记录应该以"@"字符开头
ValueError: Fastq文件中的记录应该以'@'字符开头
有人能帮我理解为什么FASTA和FASTQ不一样吗?如何检查文件是否为真正的FASTQ
根据@Alex的建议,我的尝试如下:
from Bio import SeqIO
# filename = 'fastq.fastq'
filename = 'fasta.fasta'
def is_fasta(filename):
with open(filename, "r") as handle:
fasta = SeqIO.parse(handle, "fasta")
return any(fasta)
def is_fastq(filename):
with open(filename, "r") as handle:
fastq = SeqIO.parse(handle, "fastq")
try : return any(fastq)
except Exception as e:
print(e)
return False
print(' is it fasta ? : ',is_fasta(filename))
print(' is it fastq ? : ',is_fastq(filename))
需要两个文件交替使用:
`'fastq.fastq'` or `'fasta.fasta'`
取消其中一个注释。
result with right fastq file:
is it fasta ? : False
is it fastq ? : True
result with right fasta file:
is it fasta ? : True
Records in Fastq files should start with '@' character
is it fastq ? : False
在我看来,如果文件读取不正确,fasta解析器不会抛出任何错误,但只是提供一个空迭代器,而fastq解析器会警告你文件错误,请@alex纠正我,如果我错了(我也在学习)