我需要计算我从UNIPROT获得的蛋白质序列中极性/非极性,脂肪/芳香族/杂环氨基酸的数量和百分比,使用BioJava。
我在BioJava教程中找到了如何读取Fasta文件并实现此代码。但是我不知道如何解决这个问题。
如果你有什么想法,请帮助我。
也许有一些来源可以让我检查一下。
这是代码。
package biojava.biojava_project;
import java.net.URL;
import org.biojava.nbio.core.sequence.ProteinSequence;
import org.biojava.nbio.core.sequence.io.FastaReaderHelper;
public class BioSeq {
// Inserting the sequence from UNIPROT
public static ProteinSequence getSequenceForId(String uniProtId) throws Exception {
URL uniprotFasta = new URL(String.format("https://rest.uniprot.org/uniprotkb/P31574.fasta", uniProtId));
ProteinSequence seq = FastaReaderHelper.readFastaProteinSequence(uniprotFasta.openStream()).get(uniProtId);
System.out.printf("id : P31574", uniProtId, seq, System.getProperty("line.separator"), seq.getOriginalHeader());
System.out.println();
return seq;
}
public static void main(String[] args) {
try {
System.out.println(getSequenceForId("P31574"));
} catch (Exception e) {
e.printStackTrace();
}
}
}
我不知道BioJava是否将这些属性存储在任何地方。但是手动列出所有氨基酸及其性质是很容易的。然后遍历序列并计算满足该属性的序列。这里有一个极性的例子:
import java.io.InputStream;
import java.net.URL;
import java.util.Set;
import org.biojava.nbio.core.sequence.ProteinSequence;
import org.biojava.nbio.core.sequence.compound.AminoAcidCompound;
import org.biojava.nbio.core.sequence.io.FastaReaderHelper;
public class BioSeq {
public static void main(String[] args) throws Exception {
ProteinSequence seq = loadFromUniprot("P31574");
int polarCount = numberOfOccurrences(seq, /*Polar AAs:*/ Set.of("Y", "S", "T", "N", "Q", "C"));
System.out.println("% of polar AAs: " + ((double)polarCount)/seq.getLength());
}
public static ProteinSequence loadFromUniprot(String uniProtId) throws Exception {
URL uniprotFasta = new URL(String.format("https://rest.uniprot.org/uniprotkb/%s.fasta", uniProtId));
try (InputStream is = uniprotFasta.openStream()) {
return FastaReaderHelper.readFastaProteinSequence(is).get(uniProtId);
}
}
private static int numberOfOccurrences(ProteinSequence seq, Set<String> bases) {
int count = 0;
for (AminoAcidCompound aminoAcid : seq)
if(bases.contains(aminoAcid.getBase()))
count++;
return count;
}
}
PS:不要忘记在使用IO流后关闭它们。在上面的例子中,我使用了try-with-resources语法,它会自动关闭InputStream。