如何检查文件是否为真正的FASTQ (python)?



我必须检查文件是FASTA, FASTQ还是这些都没有。对于FASTA检查,我使用了Bio中的SeqIO模块:

def is_fasta(filename): 
with open(filename, "r") as handle: 
fasta = SeqIO.parse(handle, "fasta") 
return any(fasta)

如果文件是FASTA则返回True,如果不是则返回False。但是当我使用这个函数的FASTQ版本时:

def is_fastq(filename):
with open(filename, "r") as handle:
fastq = SeqIO.parse(handle, "fastq")
return any(fastq)

我得到一个错误信息:

File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/Bio/SeqIO/Interfaces.py",第74行,在下一个返回下一个(self.records)File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/Bio/SeqIO/QualityIO.py",第1085行,在iterate中用于fastqgeneriterator (handle)中的title_line, seq_string, quality_string:File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/Bio/SeqIO/QualityIO.py",第932行,在FastqGeneralIteratorFastq文件中的记录应该以"@"字符开头

ValueError: Fastq文件中的记录应该以'@'字符开头

有人能帮我理解为什么FASTA和FASTQ不一样吗?如何检查文件是否为真正的FASTQ

根据@Alex的建议,我的尝试如下:

from Bio import SeqIO

# filename = 'fastq.fastq'
filename = 'fasta.fasta'
def is_fasta(filename): 
with open(filename, "r") as handle: 
fasta = SeqIO.parse(handle, "fasta") 
return any(fasta)

def is_fastq(filename):
with open(filename, "r") as handle:
fastq = SeqIO.parse(handle, "fastq")

try : return any(fastq)

except Exception as e:
print(e)
return False

print(' is it fasta ? : ',is_fasta(filename))
print(' is it fastq ? : ',is_fastq(filename))

需要两个文件交替使用:

`'fastq.fastq'`  or `'fasta.fasta'`

取消其中一个注释。

result with right fastq file:

is it fasta ? :  False
is it fastq ? :  True

result with right fasta file:

is it fasta ? :  True
Records in Fastq files should start with '@' character
is it fastq ? :  False

在我看来,如果文件读取不正确,fasta解析器不会抛出任何错误,但只是提供一个空迭代器,而fastq解析器会警告你文件错误,请@alex纠正我,如果我错了(我也在学习)

相关内容

  • 没有找到相关文章

最新更新