计算原子坐标之间的距离



我有一个文本文件,如下所示

ATOM    920  CA  GLN A 203      39.292 -13.354  17.416  1.00 55.76           C 
ATOM    929  CA  HIS A 204      38.546 -15.963  14.792  1.00 29.53           C
ATOM    939  CA  ASN A 205      39.443 -17.018  11.206  1.00 54.49           C  
ATOM    947  CA  GLU A 206      41.454 -13.901  10.155  1.00 26.32           C
ATOM    956  CA  VAL A 207      43.664 -14.041  13.279  1.00 40.65           C 
.
.
.
ATOM    963  CA  GLU A 208      45.403 -17.443  13.188  1.00 40.25           C  

我想计算两个α-碳原子之间的距离,即计算第一个和第二个原子之间,然后计算第二个和第三个原子之间的间距,依此类推……两个原子间的距离可以表示为:distance = sqrt((x1-x2)^2+(y1-y2)^2+(z1-z2)^2) .

列7、8和9分别表示x、y和z坐标。我需要打印距离和相应的残差对(第4列),如下所示。(距离值不是真实的)

GLN-HIS   4.5
HIS-ASN   3.2
ASN-GLU   2.5

如何使用perl或python进行此计算?

不要在空白处拆分

这里给出的其他答案做出了一个有缺陷的假设——坐标将是空间定界的。根据ATOM的PDB规范,这是而不是必要的情况:PDB记录值由列索引指定,并且可以相互流入。例如,您的第一条ATOM记录为:

ATOM    920  CA  GLN A 203      39.292 -13.354  17.416  1.00 55.76           C 

但这也是完全有效的:

ATOM    920  CA  GLN A 203      39.292-13.3540  17.416  1.00 55.76           C 

更好的方法

由于列指定的索引以及PDB文件中可能出现的许多其他问题,不应该编写自己的解析器。PDB格式很乱,而且有很多特殊情况和格式不正确的文件需要处理。相反,使用已经为您编写的解析器。

我喜欢Biopyton的PDB.PDBParser。它将为您将结构解析为Python对象,并配有方便的功能。如果您更喜欢Perl,请查看BioPerl。

PDB.Residue对象允许按名称对原子进行键控访问,PDB.Atom对象重载-运算符以返回两个原子之间的距离。我们可以用它来编写干净、简洁的代码:

代码

from Bio import PDB
parser = PDB.PDBParser()
# Parse the structure into a PDB.Structure object
pdb_code = "1exm"
pdb_path = "pdb1exm.ent"
struct = parser.get_structure(pdb_code, pdb_path)
# Grab the first two residues from the structure
residues = struct.get_residues()
res_one = residues.next()
res_two = residues.next()
try:
alpha_dist = res_one['CA'] - res_two['CA']
except KeyError:
print "Alpha carbon missing, computing distance impossible!"

如果数据由空格分隔,那么一个简单的split就可以完成这项工作。对行进行缓冲,以便按顺序相互比较。

use strict;
use warnings;
my @line;
while (<>) {
push @line, $_;            # add line to buffer
next if @line < 2;         # skip unless buffer is full
print proc(@line), "n";   # process and print 
shift @line;               # remove used line 
}
sub proc {
my @a = split ' ', shift;   # line 1
my @b = split ' ', shift;   # line 2
my $x = ($a[6]-$b[6]);      # calculate the diffs
my $y = ($a[7]-$b[7]);
my $z = ($a[8]-$b[8]);
my $dist = sprintf "%.1f",                # format the number
sqrt($x**2+$y**2+$z**2);   # do the calculation
return "$a[3]-$b[3]t$dist"; # return the string for printing
}

输出(带样本数据):

GLN-HIS 3.8
HIS-ASN 3.8
ASN-GLU 3.9
GLU-VAL 3.8

如果数据以制表符分隔,则可以在/t/而不是' '上进行拆分。

假设您的数据在"atoms.txt"中,这将逐行读取数据,并将条目拆分为一个列表:

import csv
with open("atoms.txt") as f:
reader = csv.reader(f)
for line, in reader:
entries = line.split()
print entries

现在,对于每个列表,提取所需的列,并计算距离等(请记住,python中的列表是基于零的)。

理想情况下,您应该将MDAnalysis包用于"切片"原子和分段并计算它们之间的距离度量的Python方法。事实上,MDAnalysis支持多种MD模拟和化学结构格式。

关于一个更详细的例子,请参阅Biostars.org上的以下条目

如果您只对一对感兴趣,bash可以很好地工作。这是我使用的一个脚本,我设置它在最后重新启动(如果你愿意,请关闭它)。它会提示你选择哪个原子。PDB文件可以设置不同的列,因此对于awk行,请确保列匹配。在与新的pdb文件一起使用之前,手动执行一个测试用例。这是微不足道的,但在脚本中将我的pdb文件更改为您的。

#!/usr/bin/env bash
echo " "
echo "Use one letter abbreviations. Case doesn't matter." 
echo "Example: A 17 CA or n 162 cg"
echo " - - - - - - - - - - - - - - - - - -"
#------------First Selection------------
read -e -p "What first atom? " sel1
# echo $sel1
sel1caps=${sel1^^}
# echo "sel1caps="$sel1caps
arr1=($sel1caps)
# echo "arr1[0]= "${arr1[0]}
# echo "arr1[1]= "${arr1[1]}
# echo "arr1[2]= "${arr1[2]}
#To convert one to three letters
if [ ${arr1[0]} = A ] ; then
SF1=ALA
elif [ ${arr1[0]} = H ] ; then
SF1=HIS
elif [ ${arr1[0]} = R ] ; then
SF1=ARG
elif [ ${arr1[0]} = K ] ; then
SF1=LYS
elif [ ${arr1[0]} = I ] ; then
SF1=ILE
elif [ ${arr1[0]} = F ] ; then
SF1=PHE 
elif [ ${arr1[0]} = L ] ; then
SF1=LEU
elif [ ${arr1[0]} = W ] ; then
SF1=TRP
elif [ ${arr1[0]} = M ] ; then
SF1=MET
elif [ ${arr1[0]} = P ] ; then
SF1=PRO 
elif [ ${arr1[0]} = C ] ; then
SF1=CYS 
elif [ ${arr1[0]} = N ] ; then
SF1=ASN
elif [ ${arr1[0]} = V ] ; then
SF1=VAL
elif [ ${arr1[0]} = G ] ; then
SF1=GLY 
elif [ ${arr1[0]} = S ] ; then
SF1=SER 
elif [ ${arr1[0]} = Q ] ; then
SF1=GLN 
elif [ ${arr1[0]} = Y ] ; then
SF1=TYR 
elif [ ${arr1[0]} = D ] ; then
SF1=ASP
elif [ ${arr1[0]} = E ] ; then
SF1=GLU 
elif [ ${arr1[0]} = T ] ; then
SF1=THR 
else
echo "use one letter codes"
echo "exiting"
exit
fi
# echo "SF1 ="$SF1
#If there is nothing printing for line 1, check the expression for your pdb file. The spaces may need adjustment at the end.
line1=$(grep -E "${arr1[2]} *?${SF1}(.*?) ${arr1[1]}     " 1A23.pdb)
# echo $line1
ar_l1=($line1)
# echo "ar_l1[1]="${ar_l1[1]}
echo " - - - - - - - - - - - - - - - - - -"
#------------Second Selection------------
read -e -p "What second atom? " sel2
# echo $sel2
sel2caps=${sel2^^}
# echo "sel2caps="$sel2caps
arr2=($sel2caps)
# echo "arr2[0]= "${arr2[0]}
# echo "arr2[1]= "${arr2[1]}
# echo "arr2[2]= "${arr2[2]}
#To convert one to three letters
if [ ${arr2[0]} = A ] ; then
SF2=ALA
elif [ ${arr2[0]} = H ] ; then
SF2=HIS
elif [ ${arr2[0]} = R ] ; then
SF2=ARG
elif [ ${arr2[0]} = K ] ; then
SF2=LYS
elif [ ${arr2[0]} = I ] ; then
SF2=ILE
elif [ ${arr2[0]} = F ] ; then
SF2=PHE 
elif [ ${arr2[0]} = L ] ; then
SF2=LEU
elif [ ${arr2[0]} = W ] ; then
SF2=TRP
elif [ ${arr2[0]} = M ] ; then
SF2=MET
elif [ ${arr2[0]} = P ] ; then
SF2=PRO 
elif [ ${arr2[0]} = C ] ; then
SF2=CYS 
elif [ ${arr2[0]} = N ] ; then
SF2=ASN
elif [ ${arr2[0]} = V ] ; then
SF2=VAL
elif [ ${arr2[0]} = G ] ; then
SF2=GLY 
elif [ ${arr2[0]} = S ] ; then
SF2=SER 
elif [ ${arr2[0]} = Q ] ; then
SF2=GLN 
elif [ ${arr2[0]} = Y ] ; then
SF2=TYR 
elif [ ${arr2[0]} = D ] ; then
SF2=ASP
elif [ ${arr2[0]} = E ] ; then
SF2=GLU 
elif [ ${arr2[0]} = T ] ; then
SF2=THR 
else
echo "use one letter codes"
echo "exiting"
exit
fi
# echo "SF2 ="$SF2

line2=$(grep -E "${arr2[2]} *?${SF2}(.*?) ${arr2[1]}     " 1A23.pdb)
# echo $line2
ar_l2=($line2)
# echo "ar_l2[1]="${ar_l2[1]}
# echo "ar_l2[1]="${ar_l2[1]}
atom1=${ar_l1[1]}
atom2=${ar_l2[1]}
echo "==========================="
echo ${arr1[0]}${arr1[1]}${arr1[2]}" to "${arr2[0]}${arr2[1]}${arr2[2]}":"
# 6, 7, 8 are column numbers in the pdb file. 
# If there are multiple molecules it should be 7, 8, 9.
awk '$2=='$atom1'{x1=$7;y1=$8;z1=$9}                                 # get the ATOM 1
$2=='$atom2'{x2=$7;y2=$8;z2=$9}                               # get the ATOM 2
END{print sqrt((x1-x2)*(x1-x2) + (y1-y2)*(y1-y2) + (z1-z2)*(z1-z2))}' 1A23.pdb    # calculate the distance.
echo "Angstroms"
echo "==========================="
echo " "
echo "-_-_-_-_Running Script Again_-_-_-_-"
./distance_soln.sh

一个简单的Python代码就可以完成这项工作。我假设您的所有内容都在文件"input.txt"中。

def process(line1, line2):
content1 = line1.split()
content2 = line2.split()
x1, y1, z1 = float(content1[6]), float(content1[7]), float(content1[8])
x2, y2, z2 = float(content2[6]), float(content2[7]), float(content2[8])
distance = math.sqrt(math.pow(x1-x2, 2) + math.pow(y1-y2, 2) + math.pow(z1-z2, 2))
return content1[3]+"-"+content2[3]+" "+ "{0:.2f}".format(distance)
with open("input.txt") as f:
line1 = f.readline()
for line in f:
line2 = line
print(process(line1, line2))
line1 = line2

如果你在使用这个脚本时发现任何差异或问题,请告诉我。

最新更新