使用awk或sed使用列名动态打印列

我有一个文件，从中我试图打印一个名为'grant (actual)的列，动态地使用列名。我能够通过使用下面的命令迭代列号来派生列，当前位置是列6

$ awk '/--/,/Datacenter/ ' cas.txt  | awk '{print $6}'
(actual)
49.9%
55.4%
53.5%
48.7%
(actual)
53.1%
50.0%
47.6%
48.3%
(actual)
50.0%
51.1%
48.9%
51.3%

但是我想动态地确定列数，以便如果列的位置发生变化，我的脚本应该工作。

$ cat cas.txt
Datacenter: DC01
====================
Status=TRUE/FALSE
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       USER       grant (actual)      Host ID    Vol
DN  10.0.0.138  221.03 MiB  256          49.9%             dd09f7aa  STG1
DN  10.0.0.139  173.47 MiB  256          55.4%             53179492  STG1
DN  10.0.0.136  200.08 MiB  256          53.5%             89a28140  STG1
DN  10.0.0.137  318.69 MiB  256          48.7%             8cc9dfac  STG1
Datacenter: DC02
====================
Status=TRUE/FALSE
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       USER       grant (actual)       Host ID    Vol
DN  10.0.0.142  270.01 MiB  256          53.1%             04210b53  STG1
DN  10.0.0.143  166.65 MiB  256          50.0%             d5469c9b STG1
DN  10.0.0.140  199.51 MiB  256          47.6%             fcc38a17  STG1
DN  10.0.0.141  170.52 MiB  256          48.3%             3d7b4e59  STG1
Datacenter: DC03
====================
Status=TRUE/FALSE
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       USER       grant (actual)       Host ID    Vol
DN  10.0.0.150  229.2 MiB  256           50.0%             0fa51a1a  STG1
DN  10.0.0.151  195.88 MiB  256          51.1%             e329ac17  STG1
DN  10.0.0.148  147.01 MiB  256          48.9%             c14bd7ae  STG1
DN  10.0.0.149  298.34 MiB  256          51.3%             6c73d2b5  STG1

使用GNU awk forFIELDWIDTHS和split()的第四个参数，您可以创建一个数组(下面的f[])，将列名映射到它们的数字，然后您可以打印，比较，重新排序或做任何其他您喜欢的列，只是通过索引该数组的列名:

$ cat tst.awk
/^--/ {
if ( FIELDWIDTHS == "" ) {
wids = ""
numFlds = split($0,flds,/  +/,seps)
for ( fldNr=1; fldNr<=numFlds; fldNr++ ) {
f[flds[fldNr]] = fldNr
wids = (fldNr>1 ? wids " " : "") length(flds[fldNr] seps[fldNr])
}
FIELDWIDTHS = wids
$0 = $0
}
inBlock = 1
}
inBlock {
if ( /^Datacenter:/ ) {
print ""
inBlock = 0
next
}
for ( i=1; i<=NF; i++ ) {
gsub(/^s+|s+$/,"",$i)
}
print $(f["grant (actual)"])
}

$ awk -f tst.awk cas.txt
grant (actual)
49.9%
55.4%
53.5%
48.7%
grant (actual)
53.1%
50.0%
47.6%
48.3%
grant (actual)
50.0%
51.1%
48.9%
51.3%

结合@Dan和@Daweo的想法

awk -F' {2,}' -v col='grant (actual)' '
/^Datacenter/ {i=0}
$1 == "--" {for (i=1; i<=NF; i++) if ($i == col) break; next}
i {print $i}
' cas.txt

49.9%
55.4%
53.5%
48.7%
53.1%
50.0%
47.6%
48.3%
50.0%
51.1%
48.9%
51.3%

如果您想在输出中看到col标头，只需删除next

考虑以下示例，让file.txt内容为

-- Able Baker Charlie
DN 1    2     3
DN 4    5     6
DN 7    8     9
-- Charlie
DN 10
DN 11
DN 12

然后

awk 'BEGIN{colname="Charlie"}/--/{delete names;for(i=1;i<=NF;i+=1){names[$i]=i};next}{print $(names[colname])}' file.txt

给输出

解释:我使用colname变量来存储所需的列名。当遇到包含——的行时，它被视为带有列名的标题。names数组被清除，以防止前一个块的残余，然后填充，以便列(键)的名称对应于它的位置(值)。在这样做之后，我指示GNUAWK处理next行，即不打印任何内容。对于其他行，我通知GNUAWK查找与所选名称对应的数字，并通知print查找该列。

(在gawk 4.2.1中测试)

查看您的数据，我们将使用split()将记录拆分为2个或更多空格(/ +/):

$ awk '$1~/^--$/ {                  # -- starts the header record
n=split($0,h,/  +/)             # get field count n of header record
for(i=1;i<=n;i++)               # iterate fields 
if(h[i]=="grant (actual)")  # looking for desired header
break                   # break once found, i is the field number
}
split($0,a,/  +/)==n {              # process records with equal amount of fields
print a[i]                      # and output ith field
}' file

输出:

grant (actual)
49.9%
55.4%
53.5%
48.7%
grant (actual)
53.1%
47.6%
48.3%
grant (actual)
50.0%
51.1%
48.9%
51.3%

上面的

对于最后一个字段仅以1个空格分隔的记录失败:

DN  10.0.0.143  166.65 MiB  256          50.0%             d5469c9b STG1

简介

基于awk的解决方案:

- doesn't require gnu-gawk for FIELDWIDTHS/fixed width fields
- doesn't require fudging with FS/OFS/RS/FPAT
- doesn't require a specialized regex engine, 
e.g. with back-references support
- doesn't require array-splitting or dealing with the 
painfully slow match() function
- doesn't *even* require a single call to any function

输入>
Datacenter: DC01 ==================== Status=TRUE/FALSE |/ State=Normal/Leaving/Joining/Moving -- Address Load USER grant (actual) Host ID Vol DN 10.0.0.138 221.03 MiB 256 49.9% dd09f7aa STG1 DN 10.0.0.139 173.47 MiB 256 55.4% 53179492 STG1 DN 10.0.0.136 200.08 MiB 256 53.5% 89a28140 STG1 DN 10.0.0.137 318.69 MiB 256 48.7% 8cc9dfac STG1 Datacenter: DC02 ==================== Status=TRUE/FALSE |/ State=Normal/Leaving/Joining/Moving -- Address Load USER grant (actual) Host ID Vol DN 10.0.0.142 270.01 MiB 256 53.1% 04210b53 STG1 DN 10.0.0.143 166.65 MiB 256 50.0% d5469c9b STG1 DN 10.0.0.140 199.51 MiB 256 47.6% fcc38a17 STG1 DN 10.0.0.141 170.52 MiB 256 48.3% 3d7b4e59 STG1 Datacenter: DC03 ==================== Status=TRUE/FALSE |/ State=Normal/Leaving/Joining/Moving -- Address Load USER grant (actual) Host ID Vol DN 10.0.0.150 229.2 MiB 256 50.0% 0fa51a1a STG1 DN 10.0.0.151 195.88 MiB 256 51.1% e329ac17 STG1 DN 10.0.0.148 147.01 MiB 256 48.9% c14bd7ae STG1 DN 10.0.0.149 298.34 MiB 256 51.3% 6c73d2b5 STG1
< cas.txt | {m,g}awk ' !NF ? !_ : /^[=]+/ ? ($!_=!__ ? "" : " ") : --NF<+_ ? !_ : __+=($!_=(/%/?"":$(_-_^!_)" ")($_))^!_' _=6
1 grant (actual) 2 49.9% 3 55.4% 4 53.5% 5 48.7% 6 7 grant (actual) 8 53.1% 9 50.0% 10 47.6% 11 48.3% 12 13 grant (actual) 14 50.0% 15 51.1% 16 48.9% 17 51.3%

相关内容

最新更新

热门标签：