使用python读取xyz坐标、格式化并创建字典



我有以下输入。

-1.93716260932213      4.07761284665160      0.00026114755225      o
6.18617624849570      1.21557897823238      0.00149060336893      o
2.08819081881417      2.59383844400838      0.00029402878682      n
2.97257640904282     -1.65736881444699     -0.00056145022980      n
-1.36088269076778     -0.37920984224593     -0.00050286871993      c
-0.53339788798729      2.26822332595375     -0.00000341410519      c
0.43736009141134     -2.19626465902310     -0.00100572484170      c
-4.13480467711929     -0.88129495575000      0.00005233281548      c
3.94803054683376      0.76762677032173      0.00037150755793      c
-0.03940495969409     -4.20532755533682     -0.00126348348509      h
2.71228553263687      4.40896687397411      0.00089118224220      h
4.27812393785853     -3.05506184574341     -0.00070847092229      h
-5.03899119562699     -0.01950727743747     -1.66429295994022      h
-5.03815196825505     -0.01998122074952      1.66509909190865      h
-4.53994759632051     -2.91783106840876     -0.00012152198798      h

我希望得到以下输出。

['o', 'n', 'n', 'c', 'c', 'c', 'c', 'c', 'h', 'h', 'h', 'h']
[[6.1861762484957, 1.21557897823238, 0.00149060336893], [2.08819081881417, 2.59383844400838, 0.00029402878682], [2.97257640904282, -1.65736881444699, -0.0005614502298], [-1.36088269076778, -0.37920984224593, -0.00050286871993], [-0.53339788798729, 2.26822332595375, -3.41410519e-06], [0.43736009141134, -2.1962646590231, -0.0010057248417], [-4.13480467711929, -0.88129495575, 5.233281548e-05], [3.94803054683376, 0.76762677032173, 0.00037150755793], [-0.03940495969409, -4.20532755533682, -0.00126348348509], [2.71228553263687, 4.40896687397411, 0.0008911822422], [4.27812393785853, -3.05506184574341, -0.00070847092229], [-5.03899119562699, -0.01950727743747, -1.66429295994022]]

我可以通过编写以下代码来获得:

import re
with open("coord", "r") as input_file:
lines = input_file.readlines()
atom_order = []
coords = []
for line in lines[1:-2]:
line_split = re.split("s+", line.strip())
atom_order.append(line_split[-1])
coords.append([float(val) for val in line_split[0:3]])
print(atom_order)
print(coords)
first_space = 4 * " "
first_space_neg = 3 * " "
space = 6 * " "
space_neg = 5 * " "
with open("test.out", "w") as output_file:
for coord in coords:
if coord[0] < 0:
s1 = first_space_neg
else:
s1 = first_space
if coord[1] < 0:
s2 = space_neg
else:
s2 = space
if coord[2] < 0:
s3 = space_neg
else:
s3 = space
output_file.write(s1 + f"{coord[0]:1.14f}" + s2 + f"{coord[1]:1.14f}" + s3 + f"{coord[2]:1.14f}" + "n")

但是,如果文件开头有一行包含一些字符,如散列、感叹号或美元符号,则此代码会中断,例如

## - a commented line with hash Here is when it will break.
-1.93716260932213      4.07761284665160      0.00026114755225      o
6.18617624849570      1.21557897823238      0.00149060336893      o

所以我想知道是否有人能帮我解决这个问题?

检查行的第一个字符是否是!$#continue中的一个,如果是:

for line in lines[1:-2]:
if line[0] in "!$#":
continue
else:
line_split = re.split("s+", line.strip())
atom_order.append(line_split[-1])
coords.append([float(val) for val in line_split[0:3]])

最新更新