我有非平凡的任务,可以从大型CSV日志中提取一些相关数据,看起来像
Frame #,Residue,Internal,van der Waals,Electrostatic,Polar Solvation,Non-Polar Solv.,TOTAL
1,1,119.745,0.356,-132.009,-95.618,1.7886312,-105.7373688
1,2,106.093,-3.835,-182.473,40.582,0.7132608,-38.9197392
1,3,21.228,-1.744,-38.026,-7.707,1.1189664,-25.1300336
1,4,-5.717,-4.721,-30.38,-4.839,0.406512,-45.250488
1,5,70.846,-4.127,-53.317,-2.534,0.7808472,11.6488472
...
2,1,119.745,0.356,-132.009,-95.618,1.7886312,-105.7373688
2,2,106.093,-3.835,-182.473,40.582,0.7132608,-38.9197392
2,3,21.228,-1.744,-38.026,-7.707,1.1189664,-25.1300336
2,4,-5.717,-4.721,-30.38,-4.839,0.406512,-45.250488
2,5,70.846,-4.127,-53.317,-2.534,0.7808472,11.6488472
...
n,1,119.745,0.356,-132.009,-95.618,1.7886312,-105.7373688
n,2,106.093,-3.835,-182.473,40.582,0.7132608,-38.9197392
n,3,21.228,-1.744,-38.026,-7.707,1.1189664,-25.1300336
n,4,-5.717,-4.721,-30.38,-4.839,0.406512,-45.250488
n,5,70.846,-4.127,-53.317,-2.534,0.7808472,11.6488472
在这里,我想根据第1列(#frame编号)从第二列(#isidue)中选择一个值(#isidue),然后在其最后一列(#total Energy)的" #snapshot Number列的函数"(#Total Energy)(#total Energy)(#frame编号)中选择一个值。换句话说,我需要1)根据第二列首先对所有数据进行排序):即选择每个字符串,其中第二列中的数字等于指定值(即n = 27)
#Frame, #Residue
1,27, ... , # last column value which is interested for me!
2,27, ... , # last column value which is interested for me!
3,27, ... , # last column value which is interested for me!
3,27, ... , # last column value which is interested for me!
而不是提取其最后一列的相应值,因此恢复日志将具有3列:
#Frame, #Residue, # Total energy
1,27, # last column value which is interested for me!
2,27, # last column value which is interested for me!
3,27, # last column value which is interested for me!
3,27, # last column value which is interested for me!
将感谢使用AWK和SED!
实现的任何意识谢谢!
gleb
要在第二列中使用27提取行,您可以使用grep
:
grep '^[^,]+,27,' input.csv
| | |
beginning | |
not comma |
repeated
仅输出第一,第二和第8列,请使用cut
:
grep '^[^,]+,27' input.csv | cut -d, -f1,2,8
| |
delimiter |
fields
要按第二列对文件进行排序,您可以使用sort
:
sort -t, -nk2,2 input.csv
| | |
delimiter | |
numeric |
sort by only the second field
这是一个尴尬解决方案:
awk -v n=27 'BEGIN { OFS = FS = "," } $2 == n { print $1, $2, $NF }' input.csv
-
-v n=27
-首先分配尴尬变量n
值27
-
BEGIN { OFS = FS = "," }
-开始在尴尬开始解析任何数据之前运行。在这里,我们将FS(场分离器)和OFS(输出场分隔符)设置为",以便将输入线和输出线通过逗号分开/分离。 -
$2 == n { print $1, $2, $NF }
-对于任何记录(线),第二个字段($ 2)等于n,输出第一个,第二和最后一个字段。
在 m 匹配之后停止:
awk -v n=27 -v m=3 'BEGIN { OFS = FS = "," } $2 == n { print $1, $2, $NF; if (++count == m) exit}' input.csv