MATLAB-如何从另一个文本文件中具有ID的SDF文件中将分子提取到新的SDF文件中



我有一个SDF文件,其中包含数千个分子和几个ID的文本文件,这些文件按某些特征组合在一起。现在,我有一个脚本,它加载到具有分子特征的CSV数据库中,并通过基于这些特征进行分类来生成ID文本文件。我想使用这些文本文件来解析SDF文件,以获得具有相应分子的新SDF文件。此外,我想在MATLAB中做这件事。

例如,以下是原始SDF文件中的一些分子:

NCGC00178831-03
Marvin  07111412562D          
34 37  0  0  0  0            999 V2000
4.8814   -2.7443    0.0000 Cl  0  5  0  0  0  0  0  0  0  0  0  0
2.8647   -2.4751    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
2.8647   -1.6501    0.0000 N   0  3  0  0  0  0  0  0  0  0  0  0
3.5808   -1.2318    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
4.2970   -1.6501    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
5.0017   -1.2318    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
5.7179   -1.6501    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
5.0017   -0.4068    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
4.2970    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
3.5808   -0.4068    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
2.8647    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
2.1485   -0.4068    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
2.1485   -1.2318    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
1.4324   -1.6501    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
0.7162   -1.2318    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
0.0000   -1.6501    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
0.7162   -0.4068    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
1.4324    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
2.8761   -3.5407    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
3.5923   -3.9590    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
4.3084   -3.5407    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
5.0132   -3.9590    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
5.7293   -3.5407    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
5.0132   -4.7840    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
4.3084   -5.1908    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
3.5923   -4.7840    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
2.8761   -5.1908    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
2.1599   -4.7840    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
2.1599   -3.9590    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
1.4438   -3.5407    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
0.7276   -3.9590    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
0.0115   -3.5407    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
0.7276   -4.7840    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
1.4438   -5.1908    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
2  3  1  0  0  0  0
3  4  2  0  0  0  0
3 13  1  0  0  0  0
4  5  1  0  0  0  0
4 10  1  0  0  0  0
5  6  2  0  0  0  0
6  7  1  0  0  0  0
6  8  1  0  0  0  0
8  9  2  0  0  0  0
9 10  1  0  0  0  0
10 11  2  0  0  0  0
11 12  1  0  0  0  0
12 13  2  0  0  0  0
12 18  1  0  0  0  0
13 14  1  0  0  0  0
14 15  2  0  0  0  0
15 16  1  0  0  0  0
15 17  1  0  0  0  0
17 18  2  0  0  0  0
19 20  2  0  0  0  0
19 29  1  0  0  0  0
20 21  1  0  0  0  0
20 26  1  0  0  0  0
21 22  2  0  0  0  0
22 23  1  0  0  0  0
22 24  1  0  0  0  0
24 25  2  0  0  0  0
25 26  1  0  0  0  0
26 27  2  0  0  0  0
27 28  1  0  0  0  0
28 29  2  0  0  0  0
28 34  1  0  0  0  0
29 30  1  0  0  0  0
30 31  2  0  0  0  0
31 32  1  0  0  0  0
31 33  1  0  0  0  0
33 34  2  0  0  0  0
M  CHG  2   1  -1   3   1
M  END
>  <Formula>
C27H25ClN6
>  <FW>
468.9806 (35.4535+224.2805+209.2465)
>  <DSSTox_CID>
25848
>  <SR-HSE>
0
$$$$
NCGC00166114-03
Marvin  07111412562D          
31 32  0  0  0  0            999 V2000
4.9884   -1.2417    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
4.9884   -2.0696    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
4.2748   -2.4764    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
4.2748   -3.7038    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
4.9884   -4.1178    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
5.7021   -3.7038    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
6.4157   -4.1178    0.0000 O   0  5  0  0  0  0  0  0  0  0  0  0
5.7021   -2.8760    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
4.9884   -4.9385    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
4.2748   -5.3524    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
3.5612   -4.9385    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
3.5612   -4.1178    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
3.5612   -2.0696    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
3.5612   -1.2417    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
4.2748   -0.8279    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
2.8403   -0.8279    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
2.1267   -1.2417    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
2.1267   -2.0696    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
2.8403   -2.4764    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
1.4202   -2.4764    0.0000 Br  0  0  0  0  0  0  0  0  0  0  0  0
1.4202   -0.8279    0.0000 O   0  5  0  0  0  0  0  0  0  0  0  0
2.8403    0.0000    0.0000 Br  0  0  0  0  0  0  0  0  0  0  0  0
5.7021   -2.4764    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
6.4229   -2.0696    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
6.4229   -1.2417    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
5.7021   -0.8279    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
5.7021    0.0000    0.0000 Br  0  0  0  0  0  0  0  0  0  0  0  0
7.1366   -0.8279    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
7.1366   -2.4764    0.0000 Br  0  0  0  0  0  0  0  0  0  0  0  0
7.0866   -4.1963    0.0000 Na  0  3  0  0  0  0  0  0  0  0  0  0
0.0000   -0.7708    0.0000 Na  0  3  0  0  0  0  0  0  0  0  0  0
1  2  1  0  0  0  0
1 15  1  0  0  0  0
1 26  2  0  0  0  0
2  3  2  0  0  0  0
2 23  1  0  0  0  0
3  4  1  0  0  0  0
3 13  1  0  0  0  0
4  5  2  0  0  0  0
4 12  1  0  0  0  0
5  6  1  0  0  0  0
5  9  1  0  0  0  0
6  7  1  0  0  0  0
6  8  2  0  0  0  0
9 10  2  0  0  0  0
10 11  1  0  0  0  0
11 12  2  0  0  0  0
13 14  2  0  0  0  0
13 19  1  0  0  0  0
14 15  1  0  0  0  0
14 16  1  0  0  0  0
16 17  2  0  0  0  0
16 22  1  0  0  0  0
17 18  1  0  0  0  0
17 21  1  0  0  0  0
18 19  2  0  0  0  0
18 20  1  0  0  0  0
23 24  2  0  0  0  0
24 25  1  0  0  0  0
24 29  1  0  0  0  0
25 26  1  0  0  0  0
25 28  2  0  0  0  0
26 27  1  0  0  0  0
M  CHG  4   7  -1  21  -1  30   1  31   1
M  END
>  <Formula>
C20H6Br4Na2O5
>  <FW>
691.8542 (645.8757+22.9892+22.9892)
>  <DSSTox_CID>
5234
>  <SR-HSE>
0
$$$$
NCGC00263563-01
Marvin  07111412562D          
71 76  0  0  1  0            999 V2000
2.1953   -4.9878    0.0000 C   0  0  1  0  0  0  0  0  0  0  0  0
3.6803   -4.9878    0.0000 C   0  0  2  0  0  0  0  0  0  0  0  0
2.9701   -5.4074    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
6.5858   -4.9878    0.0000 C   0  0  2  0  0  0  0  0  0  0  0  0
5.1008   -4.9878    0.0000 C   0  0  1  0  0  0  0  0  0  0  0  0
2.1953   -4.1484    0.0000 C   0  0  2  0  0  0  0  0  0  0  0  0
11.8157   -5.6335    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
14.1239   -5.8755    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
11.0893   -5.1008    0.0000 C   0  0  1  0  0  0  0  0  0  0  0  0
3.6803   -4.1484    0.0000 C   0  0  1  0  0  0  0  0  0  0  0  0
10.2015   -5.1008    0.0000 C   0  0  1  0  0  0  0  0  0  0  0  0
12.5905   -5.1653    0.0000 C   0  0  1  0  0  0  0  0  0  0  0  0
14.9633   -5.8755    0.0000 C   0  0  1  0  0  0  0  0  0  0  0  0
4.3905   -5.4074    0.0000 C   0  0  1  0  0  0  0  0  0  0  0  0
5.8755   -5.4074    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
2.9701   -3.6803    0.0000 C   0  0  1  0  0  0  0  0  0  0  0  0
11.4606   -4.3905    0.0000 C   0  0  2  0  0  0  0  0  0  0  0  0
13.6558   -5.1653    0.0000 C   0  0  2  0  0  0  0  0  0  0  0  0
9.5559   -5.5043    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
7.2476   -5.5043    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
5.1008   -4.1484    0.0000 C   0  0  1  0  0  0  0  0  0  0  0  0
1.4850   -5.4074    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
11.8157   -2.4858    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
7.9578   -4.9878    0.0000 C   0  0  2  0  0  0  0  0  0  0  0  0
6.5858   -4.1484    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
12.5905   -2.9055    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
12.3483   -4.3905    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
11.8157   -1.6626    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
5.8755   -3.6803    0.0000 C   0  0  2  0  0  0  0  0  0  0  0  0
13.3008   -1.6626    0.0000 C   0  0  2  0  0  0  0  0  0  0  0  0
12.5905   -1.2429    0.0000 C   0  0  1  0  0  0  0  0  0  0  0  0
13.3008   -2.4858    0.0000 C   0  0  2  0  0  0  0  0  0  0  0  0
8.8457   -4.9878    0.0000 C   0  0  1  0  0  0  0  0  0  0  0  0
11.4606   -3.1961    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
14.1239   -4.5035    0.0000 C   0  0  2  0  0  0  0  0  0  0  0  0
0.7748   -4.9878    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
15.4314   -5.2137    0.0000 C   0  0  1  0  0  0  0  0  0  0  0  0
14.9633   -4.5035    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
9.9756   -4.2776    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
0.0000   -5.4074    0.0000 O   0  5  0  0  0  0  0  0  0  0  0  0
7.6673   -4.2776    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
2.1953   -5.7464    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
6.8764   -4.2776    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
9.0877   -4.2776    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
0.7748   -4.1484    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
14.5437   -6.4567    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
3.6803   -3.3736    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
2.9701   -2.9055    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
5.8755   -2.9055    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
14.0110   -1.2429    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
12.5905   -0.4197    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
1.4850   -3.6803    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
15.5444   -6.4082    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
10.5566   -4.3905    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
4.3905   -6.1177    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
4.5035   -3.7933    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
8.1838   -4.2776    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
14.0110   -2.9055    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
13.6558   -3.7449    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
16.1416   -5.2137    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
4.2130   -2.9701    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
2.1953   -2.3729    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
14.7858   -1.6626    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
13.3008    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
11.0893   -5.8755    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
12.5905   -5.9885    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
8.8941   -5.7464    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
3.6803   -5.7464    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
5.1008   -5.7464    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
13.6558   -5.9885    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
0.4681   -6.7634    0.0000 N   0  3  0  0  0  0  0  0  0  0  0  0
1  3  1  0  0  0  0
1  6  1  0  0  0  0
1 22  1  6  0  0  0
1 42  1  1  0  0  0
2  3  1  0  0  0  0
2 14  1  0  0  0  0
2 68  1  1  0  0  0
2 10  1  0  0  0  0
4 15  1  0  0  0  0
4 20  1  1  0  0  0
4 43  1  0  0  0  0
4 25  1  0  0  0  0
5 14  1  0  0  0  0
5 15  1  0  0  0  0
5 21  1  0  0  0  0
5 69  1  1  0  0  0
6 16  1  0  0  0  0
6 52  1  1  0  0  0
7  9  1  0  0  0  0
7 12  1  0  0  0  0
8 18  1  0  0  0  0
8 13  1  0  0  0  0
9 11  1  0  0  0  0
9 17  1  0  0  0  0
9 65  1  6  0  0  0
10 16  1  0  0  0  0
10 47  1  1  0  0  0
11 19  1  0  0  0  0
11 54  1  6  0  0  0
11 39  1  0  0  0  0
12 18  1  0  0  0  0
12 66  1  1  0  0  0
12 27  1  0  0  0  0
13 46  1  1  0  0  0
13 53  1  6  0  0  0
13 37  1  0  0  0  0
14 55  1  1  0  0  0
16 48  1  6  0  0  0
17 27  1  0  0  0  0
17 34  1  1  0  0  0
18 35  1  0  0  0  0
18 70  1  1  0  0  0
19 33  1  0  0  0  0
20 24  1  0  0  0  0
21 29  1  0  0  0  0
21 56  1  6  0  0  0
22 36  1  0  0  0  0
23 34  1  0  0  0  0
23 26  1  0  0  0  0
23 28  1  0  0  0  0
24 33  1  0  0  0  0
24 57  1  6  0  0  0
24 41  1  0  0  0  0
25 29  1  0  0  0  0
26 32  1  0  0  0  0
28 31  1  0  0  0  0
29 49  1  1  0  0  0
30 31  1  0  0  0  0
30 50  1  1  0  0  0
30 32  1  0  0  0  0
31 51  1  6  0  0  0
32 58  1  6  0  0  0
33 44  1  0  0  0  0
33 67  1  6  0  0  0
35 38  1  0  0  0  0
35 59  1  1  0  0  0
36 40  1  0  0  0  0
36 45  2  0  0  0  0
37 38  1  0  0  0  0
37 60  1  1  0  0  0
39 44  1  0  0  0  0
41 43  1  0  0  0  0
47 61  1  0  0  0  0
48 62  1  0  0  0  0
50 63  1  0  0  0  0
51 64  1  0  0  0  0
M  CHG  2  40  -1  71   1
M  END
>  <Formula>
C47H83NO17
>  <FW>
934.1584 (916.1205+18.0379)
>  <DSSTox_CID>
28909
>  <SR-HSE>
0
$$$$

下面是一些来自文本文件的ID:

NCGC00015959-03
NCGC00168261-01
NCGC00257010-01
NCGC00254654-01
NCGC00254471-01

生成的SDF文件应该这样开始:

NCGC00015959-03
Marvin  07111412562D          
25 30  0  0  0  0            999 V2000
3.4098   -1.3130    0.0000 N   0  3  0  0  0  0  0  0  0  0  0  0
4.8329   -1.3130    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
3.4098   -2.1380    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
4.1248   -2.5436    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
2.6948   -2.5436    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
4.8329   -2.1380    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
4.1248   -0.8937    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
5.5547   -0.8937    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
1.9799   -2.1380    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
2.6948   -3.3548    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
1.2718   -2.5436    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
1.2718   -3.3548    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
4.1248   -3.3548    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
1.9799   -3.7741    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
5.5547   -2.5436    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
6.2765   -1.3130    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
5.7128   -0.0894    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
0.4881   -2.2755    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
0.4881   -3.6160    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
6.8746   -0.7562    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
6.5378    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
0.0000   -2.9423    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
3.4098   -3.7741    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
6.2765   -2.1380    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
2.6948   -0.8937    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
1  3  1  0  0  0  0
1  7  2  0  0  0  0
1 25  1  0  0  0  0
2  7  1  0  0  0  0
2  6  2  0  0  0  0
2  8  1  0  0  0  0
3  4  2  0  0  0  0
3  5  1  0  0  0  0
4 13  1  0  0  0  0
4  6  1  0  0  0  0
5  9  1  0  0  0  0
5 10  2  0  0  0  0
6 15  1  0  0  0  0
8 16  2  0  0  0  0
8 17  1  0  0  0  0
9 11  2  0  0  0  0
10 14  1  0  0  0  0
10 23  1  0  0  0  0
11 18  1  0  0  0  0
11 12  1  0  0  0  0
12 14  2  0  0  0  0
12 19  1  0  0  0  0
13 23  2  0  0  0  0
15 24  2  0  0  0  0
16 20  1  0  0  0  0
16 24  1  0  0  0  0
17 21  1  0  0  0  0
18 22  1  0  0  0  0
19 22  1  0  0  0  0
20 21  1  0  0  0  0
M  CHG  1   1   1
M  END
>  <Formula>
C20H14NO4
>  <FW>
332.3289
>  <DSSTox_CID>
25204
>  <NR-AR>
0
>  <NR-ER-LBD>
1
>  <NR-AhR>
1
$$$$
NCGC00168261-01
Marvin  07111412562D          
23 25  0  0  0  0            999 V2000
2.1236   -2.4895    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
1.4205   -2.0662    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
2.1236   -3.3074    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
1.4205   -3.7235    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
0.7174   -2.4895    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
0.7174   -3.3074    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
2.8554   -2.0662    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
0.0000   -2.0662    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
1.4205   -1.2412    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
2.8554   -3.7235    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
3.5656   -2.4895    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
3.5656   -3.3074    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
2.8554   -1.2412    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
0.7174   -0.8251    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
0.0000   -1.2412    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
0.0430   -2.8984    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
0.7174   -4.1324    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
4.2902   -3.7378    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
0.7174    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
5.0292   -3.3145    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
6.4569   -3.3360    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
5.7538   -3.7378    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
7.1743   -3.7378    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
1  2  1  0  0  0  0
1  3  2  0  0  0  0
1  7  1  0  0  0  0
2  5  2  0  0  0  0
2  9  1  0  0  0  0
3  4  1  0  0  0  0
3 10  1  0  0  0  0
4  6  1  0  0  0  0
5  8  1  0  0  0  0
5  6  1  0  0  0  0
6 16  1  0  0  0  0
6 17  1  0  0  0  0
7 11  2  0  0  0  0
7 13  1  0  0  0  0
8 15  2  0  0  0  0
9 14  2  0  0  0  0
10 12  2  0  0  0  0
11 12  1  0  0  0  0
12 18  1  0  0  0  0
14 15  1  0  0  0  0
14 19  1  0  0  0  0
18 20  1  0  0  0  0
20 22  1  0  0  0  0
21 22  1  0  0  0  0
21 23  1  0  0  0  0
M  END
>  <Formula>
C21H26O2
>  <FW>
310.4299
>  <DSSTox_CID>
28922
>  <NR-AR>
0
>  <NR-AhR>
1
>  <SR-MMP>
1
$$$$
NCGC00257010-01
Marvin  07111412562D          
35 37  0  0  0  0            999 V2000
2.0286   -3.5779    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
7.0019   -7.8578    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
7.0019   -0.7019    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
2.8589   -3.5779    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
1.6092   -2.8589    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
1.6092   -4.2799    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
3.2784   -4.2799    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
6.5825   -7.1217    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
6.5825   -1.4381    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
0.3681   -3.5779    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
4.5024   -3.5779    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
4.5024   -4.9989    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
4.0915   -4.2799    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
5.3412   -3.5779    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
5.3412   -4.9989    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
0.7704   -4.2799    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
0.7704   -2.8589    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
7.7294   -1.1385    0.0000 F   0  0  0  0  0  0  0  0  0  0  0  0
6.2829   -0.2996    0.0000 F   0  0  0  0  0  0  0  0  0  0  0  0
7.7294   -7.4213    0.0000 F   0  0  0  0  0  0  0  0  0  0  0  0
7.4384   -8.5597    0.0000 F   0  0  0  0  0  0  0  0  0  0  0  0
6.2829   -8.2601    0.0000 F   0  0  0  0  0  0  0  0  0  0  0  0
7.4384    0.0000    0.0000 F   0  0  0  0  0  0  0  0  0  0  0  0
7.0019   -2.1485    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
7.0019   -6.4112    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
5.7607   -1.4381    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
5.7607   -7.1217    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
5.7607   -5.7008    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
5.7607   -2.8589    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
6.5825   -5.7008    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
6.5825   -2.8589    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
5.3412   -6.4112    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
5.3412   -2.1485    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
0.0000   -2.9103    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
0.0086   -4.2542    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
1  4  2  0  0  0  0
1  5  1  0  0  0  0
1  6  1  0  0  0  0
2  8  1  0  0  0  0
2 20  1  0  0  0  0
2 21  1  0  0  0  0
2 22  1  0  0  0  0
3  9  1  0  0  0  0
3 18  1  0  0  0  0
3 19  1  0  0  0  0
3 23  1  0  0  0  0
4  7  1  0  0  0  0
5 17  1  0  0  0  0
6 16  1  0  0  0  0
7 13  2  0  0  0  0
8 27  1  0  0  0  0
8 25  2  0  0  0  0
9 26  2  0  0  0  0
9 24  1  0  0  0  0
10 16  1  0  0  0  0
10 34  1  0  0  0  0
10 35  1  0  0  0  0
10 17  1  0  0  0  0
11 13  1  0  0  0  0
11 14  2  0  0  0  0
12 13  1  0  0  0  0
12 15  2  0  0  0  0
14 29  1  0  0  0  0
15 28  1  0  0  0  0
24 31  2  0  0  0  0
25 30  1  0  0  0  0
26 33  1  0  0  0  0
27 32  2  0  0  0  0
28 30  2  0  0  0  0
28 32  1  0  0  0  0
29 31  1  0  0  0  0
29 33  2  0  0  0  0
M  END
>  <Formula>
C25H24F6N4
>  <FW>
494.4753
>  <DSSTox_CID>
3868
>  <NR-AR>
0
>  <NR-ER>
1
>  <NR-AhR>
1
$$$$

我看过这篇文章:根据另一个文件中给出的ID,从SDF文件中按顺序提取分子,该文件在unix中提供了解决方案。我在命令行中使用了该解决方法:awk 'BEGIN{ORS="$$$$"}NR==FNR{a[$1]=$0;next}$1 in a' ids.txt RS="$" molecules.sdf > molecules_by_ids.sdf,并且能够得到我想要的大部分。但是,即使使用此命令行选项,我也无法从SDF文件中提取100%的分子。例如,其中一个特征有981个阳性分子,文本文件获得981个ID,该命令在SDF文件中为我提供950个分子。

我真正想要的是一个MATLAB解决方案,它不会错过生成文件中的任何分子。我感谢为解决问题所作的任何努力。谢谢

我在MATLAB中找到的一个变通方法是下面的函数,其中"id";是ID TXT文件的名称;sdfs";是SDF数据库;sdf_name";是通过ID:提取分子的新SDF文件的名称

function write_sdf(id, sdfs, sdf_name)
% Open the text file of ids.
fid = fopen(id);
% Convert the sdf file to a character array.
data = fileread(sdfs);
% For each id, get the portion of the sdf file corresponding
% to the molecule id.
while true
mol_id = fgetl(fid);
mol_full = '';
% When we're at the end of the file, leave the loop.
if mol_id == -1
% We're done with the id file.
fclose(fid);
break;
else
mol_after = extractAfter(data, mol_id);
mol_between = extractBefore(mol_after, '$$$$');
mol_full = [char(mol_id) char(mol_between) '$$$$'];
% Write the molecule to the sdf file.
writelines(mol_full, sdf_name, WriteMode='append');
end
end
end

这个解决方案的问题是速度非常慢。如果有人知道更快的方法,请告诉我!现在,我将使用这个。

相关内容

  • 没有找到相关文章

最新更新