如何在SMILES格式中更改虚拟原子的表示



嗨,我想通过使用smiles代码来完成对相同分子结构进行分组的任务。

然而,即使具有相同的结构,也很难对它们进行分组,因为虚拟原子的表示是不同的。

我正在使用RDKIT程序,我试过改变几个选项,但还没有找到解决方案。我想请求你的帮助。(rdkit version 2022.3.4)

示例微笑:(相同的结构,不同的微笑代码->所需代码格式)

  1. [1*]C(= 0)OC, [13*]C(= 0)OC ->* C = O) OC
  2. [31 *] C1 = CC = CC2 = C1C = CC = N2, [5 *] C1 = CC = CC2 = C1C = CC = N2→* C1 = CC = CC2 = C1C = CC = N2
  3. (45 *) C O (N) = (5 *) C O (N) = (19 *) C O (N) = (16 *) C = O (N) =→* C = O (N) =

听起来有点奇怪,但是你可以用AnyAtom代替AnyAtom

您可以使用ReplaceSubstructs()

from rdkit import Chem
smiles = ['[1*]C(=O)OC', '[13*]C(=O)OC',
'[31*]C1=CC=CC2=C1C=CC=N2', '[5*]C1=CC=CC2=C1C=CC=N2',
'[45*]C(N)=O', '[5*]C(N)=O', '[19*]C(N)=O', '[16*]C(N)=O']
search_patt = Chem.MolFromSmiles('*') # finds AnyAtom with or without numbers
sub_patt = Chem.MolFromSmiles('*')    # AnyAtom without numbers
for s in smiles:
m=Chem.MolFromSmiles(s, sanitize=False)
new_m = Chem.ReplaceSubstructs(m, search_patt, sub_patt, replaceAll=True)
print(s , '-->', Chem.MolToSmiles(new_m[0], kekuleSmiles=True))

输出:

[1*]C(=O)OC --> *C(=O)OC
[13*]C(=O)OC --> *C(=O)OC
[31*]C1=CC=CC2=C1C=CC=N2 --> *C1=CC=CC2=C1C=CC=N2
[5*]C1=CC=CC2=C1C=CC=N2 --> *C1=CC=CC2=C1C=CC=N2
[45*]C(N)=O --> *C(N)=O
[5*]C(N)=O --> *C(N)=O
[19*]C(N)=O --> *C(N)=O
[16*]C(N)=O --> *C(N)=O

最新更新