使用awk或sed使用列名动态打印列

  • 本文关键字:动态 打印 awk sed 使用 awk sed
  • 更新时间 :
  • 英文 :


我有一个文件,从中我试图打印一个名为'grant (actual)的列,动态地使用列名。我能够通过使用下面的命令迭代列号来派生列,当前位置是列6

$ awk '/--/,/Datacenter/ ' cas.txt  | awk '{print $6}'
(actual)
49.9%
55.4%
53.5%
48.7%
(actual)
53.1%
50.0%
47.6%
48.3%
(actual)
50.0%
51.1%
48.9%
51.3%

但是我想动态地确定列数,以便如果列的位置发生变化,我的脚本应该工作。

$ cat cas.txt
Datacenter: DC01
====================
Status=TRUE/FALSE
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       USER       grant (actual)      Host ID    Vol
DN  10.0.0.138  221.03 MiB  256          49.9%             dd09f7aa  STG1
DN  10.0.0.139  173.47 MiB  256          55.4%             53179492  STG1
DN  10.0.0.136  200.08 MiB  256          53.5%             89a28140  STG1
DN  10.0.0.137  318.69 MiB  256          48.7%             8cc9dfac  STG1
Datacenter: DC02
====================
Status=TRUE/FALSE
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       USER       grant (actual)       Host ID    Vol
DN  10.0.0.142  270.01 MiB  256          53.1%             04210b53  STG1
DN  10.0.0.143  166.65 MiB  256          50.0%             d5469c9b STG1
DN  10.0.0.140  199.51 MiB  256          47.6%             fcc38a17  STG1
DN  10.0.0.141  170.52 MiB  256          48.3%             3d7b4e59  STG1
Datacenter: DC03
====================
Status=TRUE/FALSE
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       USER       grant (actual)       Host ID    Vol
DN  10.0.0.150  229.2 MiB  256           50.0%             0fa51a1a  STG1
DN  10.0.0.151  195.88 MiB  256          51.1%             e329ac17  STG1
DN  10.0.0.148  147.01 MiB  256          48.9%             c14bd7ae  STG1
DN  10.0.0.149  298.34 MiB  256          51.3%             6c73d2b5  STG1

使用GNU awk forFIELDWIDTHSsplit()的第四个参数,您可以创建一个数组(下面的f[]),将列名映射到它们的数字,然后您可以打印,比较,重新排序或做任何其他您喜欢的列,只是通过索引该数组的列名:

$ cat tst.awk
/^--/ {
if ( FIELDWIDTHS == "" ) {
wids = ""
numFlds = split($0,flds,/  +/,seps)
for ( fldNr=1; fldNr<=numFlds; fldNr++ ) {
f[flds[fldNr]] = fldNr
wids = (fldNr>1 ? wids " " : "") length(flds[fldNr] seps[fldNr])
}
FIELDWIDTHS = wids
$0 = $0
}
inBlock = 1
}
inBlock {
if ( /^Datacenter:/ ) {
print ""
inBlock = 0
next
}
for ( i=1; i<=NF; i++ ) {
gsub(/^s+|s+$/,"",$i)
}
print $(f["grant (actual)"])
}

$ awk -f tst.awk cas.txt
grant (actual)
49.9%
55.4%
53.5%
48.7%
grant (actual)
53.1%
50.0%
47.6%
48.3%
grant (actual)
50.0%
51.1%
48.9%
51.3%

结合@Dan和@Daweo的想法

awk -F' {2,}' -v col='grant (actual)' '
/^Datacenter/ {i=0}
$1 == "--" {for (i=1; i<=NF; i++) if ($i == col) break; next}
i {print $i}
' cas.txt
49.9%
55.4%
53.5%
48.7%
53.1%
50.0%
47.6%
48.3%
50.0%
51.1%
48.9%
51.3%

如果您想在输出中看到col标头,只需删除next

考虑以下示例,让file.txt内容为

-- Able Baker Charlie
DN 1    2     3
DN 4    5     6
DN 7    8     9
-- Charlie
DN 10
DN 11
DN 12

然后

awk 'BEGIN{colname="Charlie"}/--/{delete names;for(i=1;i<=NF;i+=1){names[$i]=i};next}{print $(names[colname])}' file.txt

给输出

3
6
9
10
11
12

解释:我使用colname变量来存储所需的列名。当遇到包含——的行时,它被视为带有列名的标题。names数组被清除,以防止前一个块的残余,然后填充,以便列(键)的名称对应于它的位置(值)。在这样做之后,我指示GNUAWK处理next行,即不打印任何内容。对于其他行,我通知GNUAWK查找与所选名称对应的数字,并通知print查找该列。

(在gawk 4.2.1中测试)

查看您的数据,我们将使用split()将记录拆分为2个或更多空格(/ +/):

$ awk '$1~/^--$/ {                  # -- starts the header record
n=split($0,h,/  +/)             # get field count n of header record
for(i=1;i<=n;i++)               # iterate fields 
if(h[i]=="grant (actual)")  # looking for desired header
break                   # break once found, i is the field number
}
split($0,a,/  +/)==n {              # process records with equal amount of fields
print a[i]                      # and output ith field
}' file

输出:

grant (actual)
49.9%
55.4%
53.5%
48.7%
grant (actual)
53.1%
47.6%
48.3%
grant (actual)
50.0%
51.1%
48.9%
51.3%

上面的

对于最后一个字段仅以1个空格分隔的记录失败:

DN  10.0.0.143  166.65 MiB  256          50.0%             d5469c9b STG1

简介

基于awk的解决方案:

- doesn't require gnu-gawk for FIELDWIDTHS/fixed width fields
- doesn't require fudging with FS/OFS/RS/FPAT
- doesn't require a specialized regex engine, 
e.g. with back-references support
- doesn't require array-splitting or dealing with the 
painfully slow match() function
- doesn't *even* require a single call to any function
输入>
Datacenter: DC01
====================
Status=TRUE/FALSE
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       USER       grant (actual)      Host ID    Vol
DN  10.0.0.138  221.03 MiB  256          49.9%             dd09f7aa  STG1
DN  10.0.0.139  173.47 MiB  256          55.4%             53179492  STG1
DN  10.0.0.136  200.08 MiB  256          53.5%             89a28140  STG1
DN  10.0.0.137  318.69 MiB  256          48.7%             8cc9dfac  STG1
Datacenter: DC02
====================
Status=TRUE/FALSE
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       USER       grant (actual)       Host ID    Vol
DN  10.0.0.142  270.01 MiB  256          53.1%             04210b53  STG1
DN  10.0.0.143  166.65 MiB  256          50.0%             d5469c9b STG1
DN  10.0.0.140  199.51 MiB  256          47.6%             fcc38a17  STG1
DN  10.0.0.141  170.52 MiB  256          48.3%             3d7b4e59  STG1
Datacenter: DC03
====================
Status=TRUE/FALSE
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       USER       grant (actual)       Host ID    Vol
DN  10.0.0.150  229.2 MiB  256           50.0%             0fa51a1a  STG1
DN  10.0.0.151  195.88 MiB  256          51.1%             e329ac17  STG1
DN  10.0.0.148  147.01 MiB  256          48.9%             c14bd7ae  STG1
DN  10.0.0.149  298.34 MiB  256          51.3%             6c73d2b5  STG1

< cas.txt |
{m,g}awk '   !NF   ? !_ : /^[=]+/ ? ($!_=!__ ? "" : " ") 
: --NF<+_ ? !_ : __+=($!_=(/%/?"":$(_-_^!_)" ")($_))^!_' _=6

1  grant (actual)
2  49.9%
3  55.4%
4  53.5%
5  48.7%
6   
7  grant (actual)
8  53.1%
9  50.0%
10  47.6%
11  48.3%
12   
13  grant (actual)
14  50.0%
15  51.1%
16  48.9%
17  51.3%

最新更新