将文件的最后几行添加到文件的开头



我有几百个数据文件,结构如下:

ATOM      1  CG  TYR C  58     121.612 160.894 112.763  1.00  0.00           C
ATOM      2  CD1 TYR C  58     120.943 162.067 113.040  1.00  0.00           C
ATOM      3  CD2 TYR C  58     121.188 159.746 113.389  1.00  0.00           C
ATOM      4  CE1 TYR C  58     119.873 162.092 113.912  1.00  0.00           C
ATOM      5  CE2 TYR C  58     120.129 159.760 114.258  1.00  0.00           C
ATOM      6  CZ  TYR C  58     119.475 160.934 114.519  1.00  0.00           C
ATOM      7  OH  TYR C  58     118.415 160.939 115.392  1.00  0.00           O
ATOM      8  OD1 ASN C  60     119.864 156.037 117.108  1.00  0.00           O
ATOM      9  CG  PHE C  77     122.548 156.511 110.481  1.00  0.00           C
ATOM     10  CD1 PHE C  77     122.075 155.486 109.711  1.00  0.00           C
ATOM     11  CD2 PHE C  77     122.223 156.541 111.807  1.00  0.00           C
ATOM     12  CE1 PHE C  77     121.216 154.566 110.224  1.00  0.00           C
ATOM     13  CE2 PHE C  77     121.377 155.605 112.335  1.00  0.00           C
ATOM     14  CZ  PHE C  77     120.877 154.618 111.540  1.00  0.00           C
ATOM     15  NZ  LYS D 156     112.602 154.253 117.823  1.00  0.00           N
ATOM     16  O   ILE D 202     108.373 159.140 111.337  1.00  0.00           O
ATOM     17  N   VAL D 203     109.786 157.858 110.154  1.00  0.00           N
ATOM     18  CA  VAL D 203     110.994 158.530 110.614  1.00  0.00           C
ATOM     19  C   VAL D 203     111.459 159.524 109.568  1.00  0.00           C
ATOM     20  CB  VAL D 203     112.099 157.518 110.929  1.00  0.00           C
ATOM     21  CG1 VAL D 203     113.424 158.213 111.097  1.00  0.00           C
ATOM     22  CG2 VAL D 203     111.757 156.818 112.216  1.00  0.00           C
ATOM     23  N   GLN D 204     111.583 160.788 109.970  1.00  0.00           N
ATOM     24  O   GLN D 204     114.017 162.417 110.404  1.00  0.00           O
ATOM     25  CA  SER D 205     115.779 162.096 108.277  1.00  0.00           C
ATOM     26  CB  SER D 205     116.596 160.967 107.666  1.00  0.00           C
ATOM     27  OG  SER D 205     117.961 161.337 107.661  1.00  0.00           O
ATOM     28  C   UNL X   1     111.662 159.873 113.972  1.00  0.00           C
ATOM     29  N   UNL X   1     113.085 160.155 114.126  1.00  0.00           N
ATOM     30  C   UNL X   1     113.499 161.458 113.812  1.00  0.00           C
ATOM     31  O   UNL X   1     112.732 162.299 113.334  1.00  0.00           O
ATOM     32  C   UNL X   1     114.928 161.844 114.171  1.00  0.00           C
ATOM     33  N   UNL X   1     115.842 161.124 113.296  1.00  0.00           N
ATOM     34  C   UNL X   1     116.000 159.854 113.475  1.00  0.00           C
ATOM     35  C   UNL X   1     115.326 159.120 114.591  1.00  0.00           C
ATOM     36  C   UNL X   1     116.110 158.332 115.447  1.00  0.00           C
ATOM     37  C   UNL X   1     115.508 157.476 116.361  1.00  0.00           C
ATOM     38  CL  UNL X   1     116.480 156.444 117.332  1.00  0.00          CL
ATOM     39  C   UNL X   1     114.125 157.429 116.470  1.00  0.00           C
ATOM     40  C   UNL X   1     113.345 158.291 115.696  1.00  0.00           C
ATOM     41  C   UNL X   1     113.925 159.189 114.776  1.00  0.00           C
ATOM     42  C   UNL X   1     116.862 159.133 112.452  1.00  0.00           C
ATOM     43  C   UNL X   1     116.961 157.743 112.314  1.00  0.00           C
ATOM     44  C   UNL X   1     117.837 157.173 111.379  1.00  0.00           C
ATOM     45  C   UNL X   1     118.592 157.982 110.536  1.00  0.00           C
ATOM     46  C   UNL X   1     118.471 159.363 110.623  1.00  0.00           C
ATOM     47  C   UNL X   1     117.619 159.931 111.575  1.00  0.00           C
ATOM     48  H   UNL X   1     111.189 160.691 113.471  1.00  0.00           H
ATOM     49  H   UNL X   1     111.218 159.741 114.937  1.00  0.00           H
ATOM     50  H   UNL X   1     111.536 158.980 113.396  1.00  0.00           H
END

我的目标是将包含UNL X的行重新插入到文件的开头,并删除它们的初始位置。

然而,我最终得到的是我的代码从ATOM 50开始,然后倒数到ATOM 49,然后是ATOM 48等等。基本上,我的代码向后添加到文件中,并且不删除文件开头的UNL X行,使最终结果看起来像:

ATOM     50  H   UNL X   1     111.536 158.980 113.396  1.00  0.00           H
ATOM     49  H   UNL X   1     111.218 159.741 114.937  1.00  0.00           H
ATOM     48  H   UNL X   1     111.189 160.691 113.471  1.00  0.00           H
ATOM     47  C   UNL X   1     117.619 159.931 111.575  1.00  0.00           C
ATOM     46  C   UNL X   1     118.471 159.363 110.623  1.00  0.00           C
ATOM     45  C   UNL X   1     118.592 157.982 110.536  1.00  0.00           C
ATOM     44  C   UNL X   1     117.837 157.173 111.379  1.00  0.00           C
ATOM     43  C   UNL X   1     116.961 157.743 112.314  1.00  0.00           C
ATOM     42  C   UNL X   1     116.862 159.133 112.452  1.00  0.00           C
ATOM     41  C   UNL X   1     113.925 159.189 114.776  1.00  0.00           C
ATOM     40  C   UNL X   1     113.345 158.291 115.696  1.00  0.00           C
ATOM     39  C   UNL X   1     114.125 157.429 116.470  1.00  0.00           C
ATOM     38  CL  UNL X   1     116.480 156.444 117.332  1.00  0.00          CL
ATOM     37  C   UNL X   1     115.508 157.476 116.361  1.00  0.00           C
ATOM     36  C   UNL X   1     116.110 158.332 115.447  1.00  0.00           C
ATOM     35  C   UNL X   1     115.326 159.120 114.591  1.00  0.00           C
ATOM     34  C   UNL X   1     116.000 159.854 113.475  1.00  0.00           C
ATOM     33  N   UNL X   1     115.842 161.124 113.296  1.00  0.00           N
ATOM     32  C   UNL X   1     114.928 161.844 114.171  1.00  0.00           C
ATOM     31  O   UNL X   1     112.732 162.299 113.334  1.00  0.00           O
ATOM     30  C   UNL X   1     113.499 161.458 113.812  1.00  0.00           C
ATOM     29  N   UNL X   1     113.085 160.155 114.126  1.00  0.00           N
ATOM     28  C   UNL X   1     111.662 159.873 113.972  1.00  0.00           C
ATOM      1  CG  TYR C  58     121.612 160.894 112.763  1.00  0.00           C
ATOM      2  CD1 TYR C  58     120.943 162.067 113.040  1.00  0.00           C
ATOM      3  CD2 TYR C  58     121.188 159.746 113.389  1.00  0.00           C
ATOM      4  CE1 TYR C  58     119.873 162.092 113.912  1.00  0.00           C
ATOM      5  CE2 TYR C  58     120.129 159.760 114.258  1.00  0.00           C
ATOM      6  CZ  TYR C  58     119.475 160.934 114.519  1.00  0.00           C
ATOM      7  OH  TYR C  58     118.415 160.939 115.392  1.00  0.00           O
ATOM      8  OD1 ASN C  60     119.864 156.037 117.108  1.00  0.00           O
ATOM      9  CG  PHE C  77     122.548 156.511 110.481  1.00  0.00           C
ATOM     10  CD1 PHE C  77     122.075 155.486 109.711  1.00  0.00           C
ATOM     11  CD2 PHE C  77     122.223 156.541 111.807  1.00  0.00           C
ATOM     12  CE1 PHE C  77     121.216 154.566 110.224  1.00  0.00           C
ATOM     13  CE2 PHE C  77     121.377 155.605 112.335  1.00  0.00           C
ATOM     14  CZ  PHE C  77     120.877 154.618 111.540  1.00  0.00           C
ATOM     15  NZ  LYS D 156     112.602 154.253 117.823  1.00  0.00           N
ATOM     16  O   ILE D 202     108.373 159.140 111.337  1.00  0.00           O
ATOM     17  N   VAL D 203     109.786 157.858 110.154  1.00  0.00           N
ATOM     18  CA  VAL D 203     110.994 158.530 110.614  1.00  0.00           C
ATOM     19  C   VAL D 203     111.459 159.524 109.568  1.00  0.00           C
ATOM     20  CB  VAL D 203     112.099 157.518 110.929  1.00  0.00           C
ATOM     21  CG1 VAL D 203     113.424 158.213 111.097  1.00  0.00           C
ATOM     22  CG2 VAL D 203     111.757 156.818 112.216  1.00  0.00           C
ATOM     23  N   GLN D 204     111.583 160.788 109.970  1.00  0.00           N
ATOM     24  O   GLN D 204     114.017 162.417 110.404  1.00  0.00           O
ATOM     25  CA  SER D 205     115.779 162.096 108.277  1.00  0.00           C
ATOM     26  CB  SER D 205     116.596 160.967 107.666  1.00  0.00           C
ATOM     27  OG  SER D 205     117.961 161.337 107.661  1.00  0.00           O
ATOM     28  C   UNL X   1     111.662 159.873 113.972  1.00  0.00           C
ATOM     29  N   UNL X   1     113.085 160.155 114.126  1.00  0.00           N
ATOM     30  C   UNL X   1     113.499 161.458 113.812  1.00  0.00           C
ATOM     31  O   UNL X   1     112.732 162.299 113.334  1.00  0.00           O
ATOM     32  C   UNL X   1     114.928 161.844 114.171  1.00  0.00           C
ATOM     33  N   UNL X   1     115.842 161.124 113.296  1.00  0.00           N
ATOM     34  C   UNL X   1     116.000 159.854 113.475  1.00  0.00           C
ATOM     35  C   UNL X   1     115.326 159.120 114.591  1.00  0.00           C
ATOM     36  C   UNL X   1     116.110 158.332 115.447  1.00  0.00           C
ATOM     37  C   UNL X   1     115.508 157.476 116.361  1.00  0.00           C
ATOM     38  CL  UNL X   1     116.480 156.444 117.332  1.00  0.00          CL
ATOM     39  C   UNL X   1     114.125 157.429 116.470  1.00  0.00           C
ATOM     40  C   UNL X   1     113.345 158.291 115.696  1.00  0.00           C
ATOM     41  C   UNL X   1     113.925 159.189 114.776  1.00  0.00           C
ATOM     42  C   UNL X   1     116.862 159.133 112.452  1.00  0.00           C
ATOM     43  C   UNL X   1     116.961 157.743 112.314  1.00  0.00           C
ATOM     44  C   UNL X   1     117.837 157.173 111.379  1.00  0.00           C
ATOM     45  C   UNL X   1     118.592 157.982 110.536  1.00  0.00           C
ATOM     46  C   UNL X   1     118.471 159.363 110.623  1.00  0.00           C
ATOM     47  C   UNL X   1     117.619 159.931 111.575  1.00  0.00           C
ATOM     48  H   UNL X   1     111.189 160.691 113.471  1.00  0.00           H
ATOM     49  H   UNL X   1     111.218 159.741 114.937  1.00  0.00           H
ATOM     50  H   UNL X   1     111.536 158.980 113.396  1.00  0.00           H
END

这是我到目前为止所做的尝试:

import os

def prepend_line(file_name, line):
with open(file_name, "r+") as f: s = f.read(); f.seek(0); f.write(line + s)
pathway = r'C:UsersFamilyDesktopGABA ProjectGABA StructuresNew Ligands With HydrogensSimilar To ValiumMcule 6HUP Entire ECD Diazepam RENUMBERING TEST'  # first define the subdirectory
pathway_tree = os.walk(pathway)
os.chdir(pathway)
for subdir, dirs, files_in_dirs in pathway_tree:
#print(f"dirs! {dirs}")
pass
for file_names in files_in_dirs:
try:
if "Partial Pocket" in file_names and ".pdb" in file_names:

os.chdir(subdir) # changes to the specific sub directory using the great filter
with open(file_names, "r") as input:
for input_file_line in input:
# captures each line in the file as an item in an array
array_of_words_in_line = input_file_line.split() # further splits the line in the file as its own array with each item # being a string 

three_letter_code = array_of_words_in_line[3] # ie, UNK
if three_letter_code == "UNL" or three_letter_code == "UNK" or three_letter_code == "LIG":
prepend_line(file_names, input_file_line)
except IndexError:
pass
except NameError:
pass

您可以创建一个全新的行列表,然后将这些行写入输出文件:

i = 0
new_text = []
with open(filename, "r") as fi:
for line in fi:
if line.split()[3] in {"UNL", "UNK", "LIG"}:
new_text.insert(i, line) # reinsert at start of list
i += 1
else:
new_text.append(line) # append to the end of the list
new_text = "".join(new_text)
with open(filename, "w") as fo:
fo.write(new_text)

这种基于文本的数据处理基本上就是Unix命令行工具的用途。下面是sed的在线代码:

$ (sed -n /UNL/p data.txt; sed /UNL/d data.txt) > processed_data.txt

解释:第一个sed调用打印包含UNL的所有行。第二个命令删除包含UNL的所有行。合并后的输出被重定向到processed_data.txt