嗨,我想通过使用smiles代码来完成对相同分子结构进行分组的任务。
然而,即使具有相同的结构,也很难对它们进行分组,因为虚拟原子的表示是不同的。
我正在使用RDKIT程序,我试过改变几个选项,但还没有找到解决方案。我想请求你的帮助。(rdkit version 2022.3.4)
示例微笑:(相同的结构,不同的微笑代码->所需代码格式)
- [1*]C(= 0)OC, [13*]C(= 0)OC ->* C = O) OC
- [31 *] C1 = CC = CC2 = C1C = CC = N2, [5 *] C1 = CC = CC2 = C1C = CC = N2→* C1 = CC = CC2 = C1C = CC = N2
- (45 *) C O (N) = (5 *) C O (N) = (19 *) C O (N) = (16 *) C = O (N) =→* C = O (N) =
听起来有点奇怪,但是你可以用AnyAtom
代替AnyAtom
。
您可以使用ReplaceSubstructs()
。
from rdkit import Chem
smiles = ['[1*]C(=O)OC', '[13*]C(=O)OC',
'[31*]C1=CC=CC2=C1C=CC=N2', '[5*]C1=CC=CC2=C1C=CC=N2',
'[45*]C(N)=O', '[5*]C(N)=O', '[19*]C(N)=O', '[16*]C(N)=O']
search_patt = Chem.MolFromSmiles('*') # finds AnyAtom with or without numbers
sub_patt = Chem.MolFromSmiles('*') # AnyAtom without numbers
for s in smiles:
m=Chem.MolFromSmiles(s, sanitize=False)
new_m = Chem.ReplaceSubstructs(m, search_patt, sub_patt, replaceAll=True)
print(s , '-->', Chem.MolToSmiles(new_m[0], kekuleSmiles=True))
输出:
[1*]C(=O)OC --> *C(=O)OC
[13*]C(=O)OC --> *C(=O)OC
[31*]C1=CC=CC2=C1C=CC=N2 --> *C1=CC=CC2=C1C=CC=N2
[5*]C1=CC=CC2=C1C=CC=N2 --> *C1=CC=CC2=C1C=CC=N2
[45*]C(N)=O --> *C(N)=O
[5*]C(N)=O --> *C(N)=O
[19*]C(N)=O --> *C(N)=O
[16*]C(N)=O --> *C(N)=O