用Python计算蛋白质接触顺序

蛋白质接触顺序(CO)是残基间接触的局部性。CO还与蛋白质的折叠速度相关。较高的接触阶数表示较长的折叠时间，而较低的接触阶数来预测潜在的向下折叠，或在没有自由能屏障的情况下发生的蛋白质折叠。

这里有一个内联Web服务器和一个perl脚本来计算CO。

有没有办法计算python中的CO？

您可以利用这样一个事实，即您知道坐标数组的形状始终是(N,3)，因此您可以取消创建数组和调用np.dot，这对于像这里这样的非常小的数组来说效率较低。这样，我就把你的函数重新写成了abs_contact_order2:

from __future__ import print_function, division
import mdtraj as md
import numpy as np
import numba as nb
@nb.njit
def abs_contact_order(xyz, atoms_residue, cutoff_nm=.6):
"""Return the absolute contact order."""
contact_count = 0
seq_distance_sum = 0
cutoff_2 = cutoff_nm*cutoff_nm
N = len(atoms_residue)
for i in xrange(N):
for j in xrange(N):
seq_dist = atoms_residue[j] - atoms_residue[i]
if seq_dist > 0:
d = xyz[j] - xyz[i]
if np.dot(d, d) < cutoff_2:
seq_distance_sum += seq_dist 
contact_count += 1

if contact_count==0.:
return 0.
return seq_distance_sum/float(contact_count)
@nb.njit
def abs_contact_order2(xyz, atoms_residue, cutoff_nm=.6):
"""Return the absolute contact order."""
contact_count = 0
seq_distance_sum = 0
cutoff_2 = cutoff_nm*cutoff_nm
N = len(atoms_residue)
for i in xrange(N):
for j in xrange(N):
seq_dist = atoms_residue[j] - atoms_residue[i]
if seq_dist > 0:
d = 0.0
for k in xrange(3):
d += (xyz[j,k] - xyz[i,k])**2
if d < cutoff_2:
seq_distance_sum += seq_dist 
contact_count += 1

if contact_count==0.:
return 0.
return seq_distance_sum/float(contact_count)

然后计时：

%timeit abs_co = abs_contact_order(traj.xyz[0], seq_atoms, cutoff_nm=0.60)
1 loop, best of 3: 723 ms per loop
%timeit abs_co = abs_contact_order2(traj.xyz[0], seq_atoms, cutoff_nm=0.60)
10 loops, best of 3: 28.2 ms per loop

我已经从函数中删除了print语句，它允许您在nopython模式下运行Numba-jit。如果您真的需要这些信息，我会返回所有必要的数据，并编写一个检查返回值并根据需要打印任何调试信息的瘦包装器。

还要注意，当对Numba函数进行计时时，您应该首先通过在计时循环之外调用函数来"预热"jit。通常，如果你要多次调用该函数，Numba jit该函数所需的时间比一次调用的时间要长，所以你并不能很好地了解代码的速度。但是，如果你只调用一次函数，并且启动时间很重要，那么继续按原样计时。

更新：

你可以进一步加快速度，因为你在(i，j)和(j，i)对上循环，它们是对称的：

@nb.njit
def abs_contact_order3(xyz, atoms_residue, cutoff_nm=.6):
"""Return the absolute contact order."""
contact_count = 0
seq_distance_sum = 0
cutoff_2 = cutoff_nm*cutoff_nm
N = len(atoms_residue)
for i in xrange(N):
for j in xrange(i+1,N):
seq_dist = atoms_residue[j] - atoms_residue[i]
if seq_dist > 0:
d = 0.0
for k in xrange(3):
d += (xyz[j,k] - xyz[i,k])**2
if d < cutoff_2:
seq_distance_sum += 2*seq_dist 
contact_count += 2

if contact_count==0.:
return 0.
return seq_distance_sum/float(contact_count)

和定时：

%timeit abs_co = abs_contact_order3(traj.xyz[0], seq_atoms, cutoff_nm=0.60)
100 loops, best of 3: 15.7 ms per loop

确实存在！使用优秀的MDTraj这不是问题：

import numpy as np
import mdtraj as md
@jit
def abs_contact_order(xyz, atoms_residue, cutoff_nm=.6):
"""Return the absolute contact order."""
contact_count = 0
seq_distance_sum = 0
cutoff_2 = cutoff_nm*cutoff_nm
N = len(atoms_residue)
for i in xrange(N):
for j in xrange(N):
seq_dist = atoms_residue[j] - atoms_residue[i]
if seq_dist > 0:
d = xyz[j] - xyz[i]
if np.dot(d, d) < cutoff_2:
seq_distance_sum += seq_dist 
contact_count += 1

if contact_count==0.:
print("Warning, no contacts found!")
return 0.
print("contact_count: ", contact_count)
print("seq_distance_sum: ", seq_distance_sum)
return seq_distance_sum/float(contact_count)

运行使用：

traj = md.load("test.pdb")
print("Atoms: ", traj.n_atoms)
seq_atoms = np.array([a.residue.resSeq for a in traj.top.atoms], dtype=int)
abs_co = abs_contact_order(traj.xyz[0], seq_atoms, cutoff_nm=0.60)
print("Absolute Contact Order: ", abs_co)
print("Relative Contact Order: ", abs_co/traj.n_residues)

如果没有numba，这需要16秒，只需添加一个@jit，时间就会减少到~1s。

原始的perl脚本需要大约8s

perl contactOrder.pl -r -a -c 6 test.pdb

测试文件以及jupyter笔记本都在这里。

相关内容

最新更新

热门标签：