RDKit:尝试循环枚举时"TypeError: 'Mol' object is not iterable"



我正试图使用RDKit枚举大型化合物库,并将结果输出为CSV文件中的一列SMILES字符串。我能够成功地使用以下代码:

import os
os.chdir('xxx')
from rdkit import Chem
from rdkit.Chem import rdChemReactions
from rdkit.Chem import AllChem
rxn = rdChemReactions.ReactionFromSmarts('xxx')
rct1 = Chem.SDMolSupplier('reactants_1.sdf')
rct2 = Chem.SDMolSupplier('reactants_2.sdf')
prods = AllChem.EnumerateLibraryFromReaction(rxn,[rct1,rct2])
prods2 = [Chem.MolToSmiles(x[0]) for x in list(prods)]
import csv
with open('output.csv', 'w', newline='') as f:
writer = csv.writer(f)
for item in prods2:
writer.writerow([item])

然而,内存使用率非常高。为了减少内存使用,我尝试进行迭代枚举,每次取一个分子为"1";reatants_1";,使其与"分子"中的所有分子反应;reatants_2";,将生成的化合物写入CSV文件,然后迭代:

import os
import csv
os.chdir('xxx')
from rdkit import Chem
from rdkit.Chem import rdChemReactions
from rdkit.Chem import AllChem
rxn = rdChemReactions.ReactionFromSmarts('xxx')
rct1 = Chem.SDMolSupplier('reactants_1.sdf')
rct2 = Chem.SDMolSupplier('reactants_2.sdf')
with open('output.csv', 'w', newline='') as f:
for compound in rct1:
prods = AllChem.EnumerateLibraryFromReaction(rxn,[compound,rct2])
prods2 = [Chem.MolToSmiles(x[0]) for x in list(prods)]
writer = csv.writer(f)
for item in prods2:
writer.writerow([item])

然而,在这种情况下,对于行";prods2=[列表中x的化学摩尔ToSmiles(x[0]((prods(]":"TypeError:"Mol"对象不可迭代;。我能够在第一个实例中对"Mol"对象进行迭代而没有问题。关于如何解决这个问题,或者在枚举大型复合集时,我可以用任何其他方法大幅降低RAM使用率,有什么想法吗?

EnumerateLibraryFromReaction需要一个list

所以这应该有效:

import os
import csv
os.chdir('xxx')
from rdkit import Chem
from rdkit.Chem import rdChemReactions
from rdkit.Chem import AllChem
rxn = rdChemReactions.ReactionFromSmarts('xxx')
rct1 = Chem.SDMolSupplier('reactants_1.sdf')
rct2 = Chem.SDMolSupplier('reactants_2.sdf')
with open('output.csv', 'w', newline='') as f:
for compound in rct1:
compound = [compound] # put the mol into a list
prods = AllChem.EnumerateLibraryFromReaction(rxn,[compound,rct2])
prods2 = [Chem.MolToSmiles(x[0]) for x in list(prods)]
writer = csv.writer(f)
for item in prods2:
writer.writerow([item])

最新更新