sed:使用正则表达式从日志中删除空格



我正在处理由以下格式的许多行组成的日志:

06I: 31 H-bonds
H-bonds (donor, acceptor, hydrogen, D..A dist, D-H..A dist):
#1.1/? THR 26 N       #1.1/A UNL 1 O      #1.1/? THR 26 H       3.515  2.716
#1.1/? ASN 142 ND2    #1.1/A UNL 1 O      #1.1/? ASN 142 2HD2   3.227  2.305
#1.1/A UNL 1 N        #1.1/? THR 26 O     #1.1/A UNL 1 H        3.463  2.652
#1.2/A UNL 1 N        #1.2/? PHE 140 O    #1.2/A UNL 1 H        2.987  2.200
#1.4/? THR 26 N       #1.4/A UNL 1 S      #1.4/? THR 26 H       4.354  3.371
#1.4/? HIS 163 NE2    #1.4/A UNL 1 N     no hydrogen                                          3.137  N/A
#1.4/A UNL 1 N        #1.4/? ARG 188 O    #1.4/A UNL 1 H        3.000  2.081
#1.5/? HIS 163 NE2    #1.5/A UNL 1 N     no hydrogen                                          3.330  N/A
#1.5/? GLN 189 NE2    #1.5/A UNL 1 O      #1.5/? GLN 189 2HE2   3.029  2.132
#1.6/A UNL 1 N        #1.6/? ARG 188 O    #1.6/A UNL 1 H        2.984  2.064
#1.8/? ASN 142 ND2    #1.8/A UNL 1 N      #1.8/? ASN 142 2HD2   3.164  2.395
#1.8/? ASN 142 ND2    #1.8/A UNL 1 O      #1.8/? ASN 142 2HD2   3.031  2.180
#1.8/? GLN 189 NE2    #1.8/A UNL 1 O      #1.8/? GLN 189 1HE2   3.276  2.553
#1.8/A UNL 1 N        #1.8/? THR 190 O    #1.8/A UNL 1 H        3.257  2.407
#1.9/A UNL 1 N        #1.9/? THR 190 O    #1.9/A UNL 1 H        2.913  2.037
#1.10/? SER 144 OG    #1.10/A UNL 1 S     #1.10/? SER 144 HG    4.246  3.845
#1.10/? HIS 163 NE2   #1.10/A UNL 1 S    no hydrogen                                          3.700  N/A
#1.10/A UNL 1 N       #1.10/? THR 190 O   #1.10/A UNL 1 H       3.008  2.091
#1.12/? GLN 189 NE2   #1.12/A UNL 1 O     #1.12/? GLN 189 1HE2  2.929  2.152
#1.12/A UNL 1 N       #1.12/? PHE 140 O   #1.12/A UNL 1 H       2.912  2.012
#1.13/? ASN 142 ND2   #1.13/A UNL 1 O     #1.13/? ASN 142 2HD2  3.063  2.291
#1.14/? HIS 41 NE2    #1.14/A UNL 1 S    no hydrogen                                          3.919  N/A
#1.14/? ASN 142 ND2   #1.14/A UNL 1 O     #1.14/? ASN 142 2HD2  2.802  1.872
#1.14/A UNL 1 N       #1.14/? THR 190 O   #1.14/A UNL 1 H       2.927  1.987
#1.16/? GLN 189 NE2   #1.16/A UNL 1 N     #1.16/? GLN 189 1HE2  3.456  2.669
#1.16/? GLN 189 NE2   #1.16/A UNL 1 O     #1.16/? GLN 189 1HE2  3.079  2.177
#1.16/A UNL 1 N       #1.16/? THR 190 O   #1.16/A UNL 1 H       2.967  1.987
#1.17/? ASN 142 ND2   #1.17/A UNL 1 N     #1.17/? ASN 142 2HD2  3.218  2.294
#1.17/A UNL 1 N       #1.17/? THR 190 O   #1.17/A UNL 1 H       3.364  2.469
#1.18/? ASN 142 ND2   #1.18/A UNL 1 O     #1.18/? ASN 142 2HD2  3.117  2.142
#1.20/? ASN 142 ND2   #1.20/A UNL 1 N     #1.20/? ASN 142 2HD2  3.245  2.560
-----------------------------------------------------------------------------
structure30R: 21 H-bonds
H-bonds (donor, acceptor, hydrogen, D..A dist, D-H..A dist):
#1.4/? GLN 189 NE2    #1.4/A UNL 1 O       #1.4/? GLN 189 1HE2   3.139  2.374
#1.5/? GLN 189 NE2    #1.5/A UNL 1 N       #1.5/? GLN 189 2HE2   3.296  2.365
#1.7/? CYS 145 SG     #1.7/A UNL 1 O       #1.7/? CYS 145 HG     3.466  2.762
#1.7/A UNL 1 O        #1.7/? LEU 141 O     #1.7/A UNL 1 H        2.951  2.048
#1.8/? ASN 142 ND2    #1.8/A UNL 1 O       #1.8/? ASN 142 2HD2   3.660  3.073
#1.8/? ASN 142 ND2    #1.8/A UNL 1 O       #1.8/? ASN 142 1HD2   2.965  2.162
#1.8/? CYS 145 SG     #1.8/A UNL 1 O       #1.8/? CYS 145 HG     3.480  2.556
#1.9/? HIS 163 NE2    #1.9/A UNL 1 O      no hydrogen                                                   3.272  N/A
#1.9/A UNL 1 O        #1.9/? GLN 189 OE1   #1.9/A UNL 1 H        2.915  2.341
#1.10/? ASN 142 ND2   #1.10/A UNL 1 O      #1.10/? ASN 142 2HD2  3.100  2.185
#1.10/? GLN 189 NE2   #1.10/A UNL 1 O      #1.10/? GLN 189 1HE2  3.180  2.408
#1.10/A UNL 1 O       #1.10/? GLU 166 O    #1.10/A UNL 1 H       3.246  2.639
#1.11/? ASN 142 ND2   #1.11/A UNL 1 O      #1.11/? ASN 142 2HD2  3.122  2.204
#1.11/? HIS 163 NE2   #1.11/A UNL 1 O     no hydrogen                                                   3.313  N/A

正如你可能看到的那样,一些行(由模式"无氢"+一些数字os空格组成(不符合最后两个数字显著偏移的格式,例如no hydrogen 3.137 N/A

由于这些元素之间的空间数量可能不同,我找不到一个简单的表达式来使用sed来去除所有这些无用的空间,例如

sed -e "s/no hydrogen                     //g" 

将只匹配一个部分行。你能给我推荐一些正则表达式吗?它可以与sed一起使用,以匹配由"组成的所有行;无氢";并移除未使用的空间?

以下是预期输出:

06I: 31 H-bonds
H-bonds (donor, acceptor, hydrogen, D..A dist, D-H..A dist):
#1.1/? THR 26 N       #1.1/A UNL 1 O      #1.1/? THR 26 H       3.515  2.716
#1.1/? ASN 142 ND2    #1.1/A UNL 1 O      #1.1/? ASN 142 2HD2   3.227  2.305
#1.1/A UNL 1 N        #1.1/? THR 26 O     #1.1/A UNL 1 H        3.463  2.652
#1.2/A UNL 1 N        #1.2/? PHE 140 O    #1.2/A UNL 1 H        2.987  2.200
#1.4/? THR 26 N       #1.4/A UNL 1 S      #1.4/? THR 26 H       4.354  3.371
#1.4/? HIS 163 NE2    #1.4/A UNL 1 N     no hydrogen            3.137  N/A
#1.4/A UNL 1 N        #1.4/? ARG 188 O    #1.4/A UNL 1 H        3.000  2.081
#1.5/? HIS 163 NE2    #1.5/A UNL 1 N     no hydrogen            3.330  N/A
#1.5/? GLN 189 NE2    #1.5/A UNL 1 O      #1.5/? GLN 189 2HE2   3.029  2.132
#1.6/A UNL 1 N        #1.6/? ARG 188 O    #1.6/A UNL 1 H        2.984  2.064
#1.8/? ASN 142 ND2    #1.8/A UNL 1 N      #1.8/? ASN 142 2HD2   3.164  2.395
#1.8/? ASN 142 ND2    #1.8/A UNL 1 O      #1.8/? ASN 142 2HD2   3.031  2.180
#1.8/? GLN 189 NE2    #1.8/A UNL 1 O      #1.8/? GLN 189 1HE2   3.276  2.553
#1.8/A UNL 1 N        #1.8/? THR 190 O    #1.8/A UNL 1 H        3.257  2.407
#1.9/A UNL 1 N        #1.9/? THR 190 O    #1.9/A UNL 1 H        2.913  2.037
#1.10/? SER 144 OG    #1.10/A UNL 1 S     #1.10/? SER 144 HG    4.246  3.845
#1.10/? HIS 163 NE2   #1.10/A UNL 1 S    no hydrogen            3.700  N/A
#1.10/A UNL 1 N       #1.10/? THR 190 O   #1.10/A UNL 1 H       3.008  2.091
#1.12/? GLN 189 NE2   #1.12/A UNL 1 O     #1.12/? GLN 189 1HE2  2.929  2.152
#1.12/A UNL 1 N       #1.12/? PHE 140 O   #1.12/A UNL 1 H       2.912  2.012
#1.13/? ASN 142 ND2   #1.13/A UNL 1 O     #1.13/? ASN 142 2HD2  3.063  2.291
#1.14/? HIS 41 NE2    #1.14/A UNL 1 S    no hydrogen            3.919  N/A
#1.14/? ASN 142 ND2   #1.14/A UNL 1 O     #1.14/? ASN 142 2HD2  2.802  1.872
#1.14/A UNL 1 N       #1.14/? THR 190 O   #1.14/A UNL 1 H       2.927  1.987
#1.16/? GLN 189 NE2   #1.16/A UNL 1 N     #1.16/? GLN 189 1HE2  3.456  2.669
#1.16/? GLN 189 NE2   #1.16/A UNL 1 O     #1.16/? GLN 189 1HE2  3.079  2.177
#1.16/A UNL 1 N       #1.16/? THR 190 O   #1.16/A UNL 1 H       2.967  1.987
#1.17/? ASN 142 ND2   #1.17/A UNL 1 N     #1.17/? ASN 142 2HD2  3.218  2.294
#1.17/A UNL 1 N       #1.17/? THR 190 O   #1.17/A UNL 1 H       3.364  2.469
#1.18/? ASN 142 ND2   #1.18/A UNL 1 O     #1.18/? ASN 142 2HD2  3.117  2.142
#1.20/? ASN 142 ND2   #1.20/A UNL 1 N     #1.20/? ASN 142 2HD2  3.245  2.560
-----------------------------------------------------------------------------
structure30R: 21 H-bonds
H-bonds (donor, acceptor, hydrogen, D..A dist, D-H..A dist):
#1.4/? GLN 189 NE2    #1.4/A UNL 1 O       #1.4/? GLN 189 1HE2   3.139  2.374
#1.5/? GLN 189 NE2    #1.5/A UNL 1 N       #1.5/? GLN 189 2HE2   3.296  2.365
#1.7/? CYS 145 SG     #1.7/A UNL 1 O       #1.7/? CYS 145 HG     3.466  2.762
#1.7/A UNL 1 O        #1.7/? LEU 141 O     #1.7/A UNL 1 H        2.951  2.048
#1.8/? ASN 142 ND2    #1.8/A UNL 1 O       #1.8/? ASN 142 2HD2   3.660  3.073
#1.8/? ASN 142 ND2    #1.8/A UNL 1 O       #1.8/? ASN 142 1HD2   2.965  2.162
#1.8/? CYS 145 SG     #1.8/A UNL 1 O       #1.8/? CYS 145 HG     3.480  2.556
#1.9/? HIS 163 NE2    #1.9/A UNL 1 O      no hydrogen            3.272  N/A
#1.9/A UNL 1 O        #1.9/? GLN 189 OE1   #1.9/A UNL 1 H        2.915  2.341
#1.10/? ASN 142 ND2   #1.10/A UNL 1 O      #1.10/? ASN 142 2HD2  3.100  2.185
#1.10/? GLN 189 NE2   #1.10/A UNL 1 O      #1.10/? GLN 189 1HE2  3.180  2.408
#1.10/A UNL 1 O       #1.10/? GLU 166 O    #1.10/A UNL 1 H       3.246  2.639
#1.11/? ASN 142 ND2   #1.11/A UNL 1 O      #1.11/? ASN 142 2HD2  3.122  2.204
#1.11/? HIS 163 NE2   #1.11/A UNL 1 O     no hydrogen

使用sed

$ sed 's/(no hydrogen {12})[[:space:]]+/1/' input_fie
06I: 31 H-bonds
H-bonds (donor, acceptor, hydrogen, D..A dist, D-H..A dist):
#1.1/? THR 26 N       #1.1/A UNL 1 O      #1.1/? THR 26 H       3.515  2.716
#1.1/? ASN 142 ND2    #1.1/A UNL 1 O      #1.1/? ASN 142 2HD2   3.227  2.305
#1.1/A UNL 1 N        #1.1/? THR 26 O     #1.1/A UNL 1 H        3.463  2.652
#1.2/A UNL 1 N        #1.2/? PHE 140 O    #1.2/A UNL 1 H        2.987  2.200
#1.4/? THR 26 N       #1.4/A UNL 1 S      #1.4/? THR 26 H       4.354  3.371
#1.4/? HIS 163 NE2    #1.4/A UNL 1 N     no hydrogen            3.137  N/A
#1.4/A UNL 1 N        #1.4/? ARG 188 O    #1.4/A UNL 1 H        3.000  2.081
#1.5/? HIS 163 NE2    #1.5/A UNL 1 N     no hydrogen            3.330  N/A
#1.5/? GLN 189 NE2    #1.5/A UNL 1 O      #1.5/? GLN 189 2HE2   3.029  2.132
#1.6/A UNL 1 N        #1.6/? ARG 188 O    #1.6/A UNL 1 H        2.984  2.064
#1.8/? ASN 142 ND2    #1.8/A UNL 1 N      #1.8/? ASN 142 2HD2   3.164  2.395
#1.8/? ASN 142 ND2    #1.8/A UNL 1 O      #1.8/? ASN 142 2HD2   3.031  2.180
#1.8/? GLN 189 NE2    #1.8/A UNL 1 O      #1.8/? GLN 189 1HE2   3.276  2.553
#1.8/A UNL 1 N        #1.8/? THR 190 O    #1.8/A UNL 1 H        3.257  2.407
#1.9/A UNL 1 N        #1.9/? THR 190 O    #1.9/A UNL 1 H        2.913  2.037
#1.10/? SER 144 OG    #1.10/A UNL 1 S     #1.10/? SER 144 HG    4.246  3.845
#1.10/? HIS 163 NE2   #1.10/A UNL 1 S    no hydrogen            3.700  N/A
#1.10/A UNL 1 N       #1.10/? THR 190 O   #1.10/A UNL 1 H       3.008  2.091
#1.12/? GLN 189 NE2   #1.12/A UNL 1 O     #1.12/? GLN 189 1HE2  2.929  2.152
#1.12/A UNL 1 N       #1.12/? PHE 140 O   #1.12/A UNL 1 H       2.912  2.012
#1.13/? ASN 142 ND2   #1.13/A UNL 1 O     #1.13/? ASN 142 2HD2  3.063  2.291
#1.14/? HIS 41 NE2    #1.14/A UNL 1 S    no hydrogen            3.919  N/A
#1.14/? ASN 142 ND2   #1.14/A UNL 1 O     #1.14/? ASN 142 2HD2  2.802  1.872
#1.14/A UNL 1 N       #1.14/? THR 190 O   #1.14/A UNL 1 H       2.927  1.987
#1.16/? GLN 189 NE2   #1.16/A UNL 1 N     #1.16/? GLN 189 1HE2  3.456  2.669
#1.16/? GLN 189 NE2   #1.16/A UNL 1 O     #1.16/? GLN 189 1HE2  3.079  2.177
#1.16/A UNL 1 N       #1.16/? THR 190 O   #1.16/A UNL 1 H       2.967  1.987
#1.17/? ASN 142 ND2   #1.17/A UNL 1 N     #1.17/? ASN 142 2HD2  3.218  2.294
#1.17/A UNL 1 N       #1.17/? THR 190 O   #1.17/A UNL 1 H       3.364  2.469
#1.18/? ASN 142 ND2   #1.18/A UNL 1 O     #1.18/? ASN 142 2HD2  3.117  2.142
#1.20/? ASN 142 ND2   #1.20/A UNL 1 N     #1.20/? ASN 142 2HD2  3.245  2.560
-----------------------------------------------------------------------------
structure30R: 21 H-bonds
H-bonds (donor, acceptor, hydrogen, D..A dist, D-H..A dist):
#1.4/? GLN 189 NE2    #1.4/A UNL 1 O       #1.4/? GLN 189 1HE2   3.139  2.374
#1.5/? GLN 189 NE2    #1.5/A UNL 1 N       #1.5/? GLN 189 2HE2   3.296  2.365
#1.7/? CYS 145 SG     #1.7/A UNL 1 O       #1.7/? CYS 145 HG     3.466  2.762
#1.7/A UNL 1 O        #1.7/? LEU 141 O     #1.7/A UNL 1 H        2.951  2.048
#1.8/? ASN 142 ND2    #1.8/A UNL 1 O       #1.8/? ASN 142 2HD2   3.660  3.073
#1.8/? ASN 142 ND2    #1.8/A UNL 1 O       #1.8/? ASN 142 1HD2   2.965  2.162
#1.8/? CYS 145 SG     #1.8/A UNL 1 O       #1.8/? CYS 145 HG     3.480  2.556
#1.9/? HIS 163 NE2    #1.9/A UNL 1 O      no hydrogen            3.272  N/A
#1.9/A UNL 1 O        #1.9/? GLN 189 OE1   #1.9/A UNL 1 H        2.915  2.341
#1.10/? ASN 142 ND2   #1.10/A UNL 1 O      #1.10/? ASN 142 2HD2  3.100  2.185
#1.10/? GLN 189 NE2   #1.10/A UNL 1 O      #1.10/? GLN 189 1HE2  3.180  2.408
#1.10/A UNL 1 O       #1.10/? GLU 166 O    #1.10/A UNL 1 H       3.246  2.639
#1.11/? ASN 142 ND2   #1.11/A UNL 1 O      #1.11/? ASN 142 2HD2  3.122  2.204
#1.11/? HIS 163 NE2   #1.11/A UNL 1 O     no hydrogen

(no hydrogen {12})-使用sed的反向引用功能在括号(..)内创建一个组匹配,该功能稍后可以与1一起返回。该命令也可以写成(no hydrogen[[:space:]]{12})来强调空间的存在。这将在单词no hydrogen之后包括12个空格,以作为返回引用。

[[:space:]]+-由于这不是小组赛的一部分,它将被排除在外。这将匹配匹配单词之后的所有剩余空格以及我们希望在组匹配中保留的12个空格。

最新更新