我有几百个数据文件,结构如下:
ATOM 1 CG TYR C 58 121.612 160.894 112.763 1.00 0.00 C
ATOM 2 CD1 TYR C 58 120.943 162.067 113.040 1.00 0.00 C
ATOM 3 CD2 TYR C 58 121.188 159.746 113.389 1.00 0.00 C
ATOM 4 CE1 TYR C 58 119.873 162.092 113.912 1.00 0.00 C
ATOM 5 CE2 TYR C 58 120.129 159.760 114.258 1.00 0.00 C
ATOM 6 CZ TYR C 58 119.475 160.934 114.519 1.00 0.00 C
ATOM 7 OH TYR C 58 118.415 160.939 115.392 1.00 0.00 O
ATOM 8 OD1 ASN C 60 119.864 156.037 117.108 1.00 0.00 O
ATOM 9 CG PHE C 77 122.548 156.511 110.481 1.00 0.00 C
ATOM 10 CD1 PHE C 77 122.075 155.486 109.711 1.00 0.00 C
ATOM 11 CD2 PHE C 77 122.223 156.541 111.807 1.00 0.00 C
ATOM 12 CE1 PHE C 77 121.216 154.566 110.224 1.00 0.00 C
ATOM 13 CE2 PHE C 77 121.377 155.605 112.335 1.00 0.00 C
ATOM 14 CZ PHE C 77 120.877 154.618 111.540 1.00 0.00 C
ATOM 15 NZ LYS D 156 112.602 154.253 117.823 1.00 0.00 N
ATOM 16 O ILE D 202 108.373 159.140 111.337 1.00 0.00 O
ATOM 17 N VAL D 203 109.786 157.858 110.154 1.00 0.00 N
ATOM 18 CA VAL D 203 110.994 158.530 110.614 1.00 0.00 C
ATOM 19 C VAL D 203 111.459 159.524 109.568 1.00 0.00 C
ATOM 20 CB VAL D 203 112.099 157.518 110.929 1.00 0.00 C
ATOM 21 CG1 VAL D 203 113.424 158.213 111.097 1.00 0.00 C
ATOM 22 CG2 VAL D 203 111.757 156.818 112.216 1.00 0.00 C
ATOM 23 N GLN D 204 111.583 160.788 109.970 1.00 0.00 N
ATOM 24 O GLN D 204 114.017 162.417 110.404 1.00 0.00 O
ATOM 25 CA SER D 205 115.779 162.096 108.277 1.00 0.00 C
ATOM 26 CB SER D 205 116.596 160.967 107.666 1.00 0.00 C
ATOM 27 OG SER D 205 117.961 161.337 107.661 1.00 0.00 O
ATOM 28 C UNL X 1 111.662 159.873 113.972 1.00 0.00 C
ATOM 29 N UNL X 1 113.085 160.155 114.126 1.00 0.00 N
ATOM 30 C UNL X 1 113.499 161.458 113.812 1.00 0.00 C
ATOM 31 O UNL X 1 112.732 162.299 113.334 1.00 0.00 O
ATOM 32 C UNL X 1 114.928 161.844 114.171 1.00 0.00 C
ATOM 33 N UNL X 1 115.842 161.124 113.296 1.00 0.00 N
ATOM 34 C UNL X 1 116.000 159.854 113.475 1.00 0.00 C
ATOM 35 C UNL X 1 115.326 159.120 114.591 1.00 0.00 C
ATOM 36 C UNL X 1 116.110 158.332 115.447 1.00 0.00 C
ATOM 37 C UNL X 1 115.508 157.476 116.361 1.00 0.00 C
ATOM 38 CL UNL X 1 116.480 156.444 117.332 1.00 0.00 CL
ATOM 39 C UNL X 1 114.125 157.429 116.470 1.00 0.00 C
ATOM 40 C UNL X 1 113.345 158.291 115.696 1.00 0.00 C
ATOM 41 C UNL X 1 113.925 159.189 114.776 1.00 0.00 C
ATOM 42 C UNL X 1 116.862 159.133 112.452 1.00 0.00 C
ATOM 43 C UNL X 1 116.961 157.743 112.314 1.00 0.00 C
ATOM 44 C UNL X 1 117.837 157.173 111.379 1.00 0.00 C
ATOM 45 C UNL X 1 118.592 157.982 110.536 1.00 0.00 C
ATOM 46 C UNL X 1 118.471 159.363 110.623 1.00 0.00 C
ATOM 47 C UNL X 1 117.619 159.931 111.575 1.00 0.00 C
ATOM 48 H UNL X 1 111.189 160.691 113.471 1.00 0.00 H
ATOM 49 H UNL X 1 111.218 159.741 114.937 1.00 0.00 H
ATOM 50 H UNL X 1 111.536 158.980 113.396 1.00 0.00 H
END
我的目标是将包含UNL X
的行重新插入到文件的开头,并删除它们的初始位置。
然而,我最终得到的是我的代码从ATOM 50
开始,然后倒数到ATOM 49
,然后是ATOM 48
等等。基本上,我的代码向后添加到文件中,并且不删除文件开头的UNL X
行,使最终结果看起来像:
ATOM 50 H UNL X 1 111.536 158.980 113.396 1.00 0.00 H
ATOM 49 H UNL X 1 111.218 159.741 114.937 1.00 0.00 H
ATOM 48 H UNL X 1 111.189 160.691 113.471 1.00 0.00 H
ATOM 47 C UNL X 1 117.619 159.931 111.575 1.00 0.00 C
ATOM 46 C UNL X 1 118.471 159.363 110.623 1.00 0.00 C
ATOM 45 C UNL X 1 118.592 157.982 110.536 1.00 0.00 C
ATOM 44 C UNL X 1 117.837 157.173 111.379 1.00 0.00 C
ATOM 43 C UNL X 1 116.961 157.743 112.314 1.00 0.00 C
ATOM 42 C UNL X 1 116.862 159.133 112.452 1.00 0.00 C
ATOM 41 C UNL X 1 113.925 159.189 114.776 1.00 0.00 C
ATOM 40 C UNL X 1 113.345 158.291 115.696 1.00 0.00 C
ATOM 39 C UNL X 1 114.125 157.429 116.470 1.00 0.00 C
ATOM 38 CL UNL X 1 116.480 156.444 117.332 1.00 0.00 CL
ATOM 37 C UNL X 1 115.508 157.476 116.361 1.00 0.00 C
ATOM 36 C UNL X 1 116.110 158.332 115.447 1.00 0.00 C
ATOM 35 C UNL X 1 115.326 159.120 114.591 1.00 0.00 C
ATOM 34 C UNL X 1 116.000 159.854 113.475 1.00 0.00 C
ATOM 33 N UNL X 1 115.842 161.124 113.296 1.00 0.00 N
ATOM 32 C UNL X 1 114.928 161.844 114.171 1.00 0.00 C
ATOM 31 O UNL X 1 112.732 162.299 113.334 1.00 0.00 O
ATOM 30 C UNL X 1 113.499 161.458 113.812 1.00 0.00 C
ATOM 29 N UNL X 1 113.085 160.155 114.126 1.00 0.00 N
ATOM 28 C UNL X 1 111.662 159.873 113.972 1.00 0.00 C
ATOM 1 CG TYR C 58 121.612 160.894 112.763 1.00 0.00 C
ATOM 2 CD1 TYR C 58 120.943 162.067 113.040 1.00 0.00 C
ATOM 3 CD2 TYR C 58 121.188 159.746 113.389 1.00 0.00 C
ATOM 4 CE1 TYR C 58 119.873 162.092 113.912 1.00 0.00 C
ATOM 5 CE2 TYR C 58 120.129 159.760 114.258 1.00 0.00 C
ATOM 6 CZ TYR C 58 119.475 160.934 114.519 1.00 0.00 C
ATOM 7 OH TYR C 58 118.415 160.939 115.392 1.00 0.00 O
ATOM 8 OD1 ASN C 60 119.864 156.037 117.108 1.00 0.00 O
ATOM 9 CG PHE C 77 122.548 156.511 110.481 1.00 0.00 C
ATOM 10 CD1 PHE C 77 122.075 155.486 109.711 1.00 0.00 C
ATOM 11 CD2 PHE C 77 122.223 156.541 111.807 1.00 0.00 C
ATOM 12 CE1 PHE C 77 121.216 154.566 110.224 1.00 0.00 C
ATOM 13 CE2 PHE C 77 121.377 155.605 112.335 1.00 0.00 C
ATOM 14 CZ PHE C 77 120.877 154.618 111.540 1.00 0.00 C
ATOM 15 NZ LYS D 156 112.602 154.253 117.823 1.00 0.00 N
ATOM 16 O ILE D 202 108.373 159.140 111.337 1.00 0.00 O
ATOM 17 N VAL D 203 109.786 157.858 110.154 1.00 0.00 N
ATOM 18 CA VAL D 203 110.994 158.530 110.614 1.00 0.00 C
ATOM 19 C VAL D 203 111.459 159.524 109.568 1.00 0.00 C
ATOM 20 CB VAL D 203 112.099 157.518 110.929 1.00 0.00 C
ATOM 21 CG1 VAL D 203 113.424 158.213 111.097 1.00 0.00 C
ATOM 22 CG2 VAL D 203 111.757 156.818 112.216 1.00 0.00 C
ATOM 23 N GLN D 204 111.583 160.788 109.970 1.00 0.00 N
ATOM 24 O GLN D 204 114.017 162.417 110.404 1.00 0.00 O
ATOM 25 CA SER D 205 115.779 162.096 108.277 1.00 0.00 C
ATOM 26 CB SER D 205 116.596 160.967 107.666 1.00 0.00 C
ATOM 27 OG SER D 205 117.961 161.337 107.661 1.00 0.00 O
ATOM 28 C UNL X 1 111.662 159.873 113.972 1.00 0.00 C
ATOM 29 N UNL X 1 113.085 160.155 114.126 1.00 0.00 N
ATOM 30 C UNL X 1 113.499 161.458 113.812 1.00 0.00 C
ATOM 31 O UNL X 1 112.732 162.299 113.334 1.00 0.00 O
ATOM 32 C UNL X 1 114.928 161.844 114.171 1.00 0.00 C
ATOM 33 N UNL X 1 115.842 161.124 113.296 1.00 0.00 N
ATOM 34 C UNL X 1 116.000 159.854 113.475 1.00 0.00 C
ATOM 35 C UNL X 1 115.326 159.120 114.591 1.00 0.00 C
ATOM 36 C UNL X 1 116.110 158.332 115.447 1.00 0.00 C
ATOM 37 C UNL X 1 115.508 157.476 116.361 1.00 0.00 C
ATOM 38 CL UNL X 1 116.480 156.444 117.332 1.00 0.00 CL
ATOM 39 C UNL X 1 114.125 157.429 116.470 1.00 0.00 C
ATOM 40 C UNL X 1 113.345 158.291 115.696 1.00 0.00 C
ATOM 41 C UNL X 1 113.925 159.189 114.776 1.00 0.00 C
ATOM 42 C UNL X 1 116.862 159.133 112.452 1.00 0.00 C
ATOM 43 C UNL X 1 116.961 157.743 112.314 1.00 0.00 C
ATOM 44 C UNL X 1 117.837 157.173 111.379 1.00 0.00 C
ATOM 45 C UNL X 1 118.592 157.982 110.536 1.00 0.00 C
ATOM 46 C UNL X 1 118.471 159.363 110.623 1.00 0.00 C
ATOM 47 C UNL X 1 117.619 159.931 111.575 1.00 0.00 C
ATOM 48 H UNL X 1 111.189 160.691 113.471 1.00 0.00 H
ATOM 49 H UNL X 1 111.218 159.741 114.937 1.00 0.00 H
ATOM 50 H UNL X 1 111.536 158.980 113.396 1.00 0.00 H
END
这是我到目前为止所做的尝试:
import os
def prepend_line(file_name, line):
with open(file_name, "r+") as f: s = f.read(); f.seek(0); f.write(line + s)
pathway = r'C:UsersFamilyDesktopGABA ProjectGABA StructuresNew Ligands With HydrogensSimilar To ValiumMcule 6HUP Entire ECD Diazepam RENUMBERING TEST' # first define the subdirectory
pathway_tree = os.walk(pathway)
os.chdir(pathway)
for subdir, dirs, files_in_dirs in pathway_tree:
#print(f"dirs! {dirs}")
pass
for file_names in files_in_dirs:
try:
if "Partial Pocket" in file_names and ".pdb" in file_names:
os.chdir(subdir) # changes to the specific sub directory using the great filter
with open(file_names, "r") as input:
for input_file_line in input:
# captures each line in the file as an item in an array
array_of_words_in_line = input_file_line.split() # further splits the line in the file as its own array with each item # being a string
three_letter_code = array_of_words_in_line[3] # ie, UNK
if three_letter_code == "UNL" or three_letter_code == "UNK" or three_letter_code == "LIG":
prepend_line(file_names, input_file_line)
except IndexError:
pass
except NameError:
pass
您可以创建一个全新的行列表,然后将这些行写入输出文件:
i = 0
new_text = []
with open(filename, "r") as fi:
for line in fi:
if line.split()[3] in {"UNL", "UNK", "LIG"}:
new_text.insert(i, line) # reinsert at start of list
i += 1
else:
new_text.append(line) # append to the end of the list
new_text = "".join(new_text)
with open(filename, "w") as fo:
fo.write(new_text)
这种基于文本的数据处理基本上就是Unix命令行工具的用途。下面是sed
的在线代码:
$ (sed -n /UNL/p data.txt; sed /UNL/d data.txt) > processed_data.txt
解释:第一个sed
调用打印包含UNL
的所有行。第二个命令删除包含UNL
的所有行。合并后的输出被重定向到processed_data.txt
。