我有一个像这样的大文件:
RESOURCETAGMAPPINGLIST arn:aws:ec2:us-east-1:XXXXXX:instance/i-XXXXXXXXXXXXXXXXX
TAGS app-name appname1
RESOURCETAGMAPPINGLIST arn:aws:ec2:us-east-1:XXXXXX:instance/i-XXXXXXXXXXXXXXXXX
TAGS app-name appname2
RESOURCETAGMAPPINGLIST arn:aws:ec2:us-east-1:XXXXXX:instance/i-XXXXXXXXXXXXXXXXX
TAGS app-name appname1
..
我只想用RESOURCETAGMAPPINGLIST
修改行,并打印其他行w/out修改。然后我想只打印匹配的特定字段,如下所示:
arn ec2 us-east-1 XXXXXX
TAGS app-name appname1
arn ec2 us-east-1 XXXXXX
TAGS app-name appname2
arn ec2 us-east-1 XXXXXX
TAGS app-name appname1
..
我试图使用awk gsub命令,但真的不能得到-F:
部分的工作。不管是awk、sed还是perl,任何帮助都将是非常感激的。
与awk
。我使用至少一个空格(+
)或(|
)一个冒号(:
)作为输入字段分隔符。
如果一行包含字符串RESOURCETAGMAPPINGLIST
,则打印第2、4、5、6列,并停止处理该行,继续处理下一行。如果一行不包含RESOURCETAGMAPPINGLIST
,则打印完整的行
awk -F ' +|:' '/RESOURCETAGMAPPINGLIST/{print $2,$4,$5,$6; next} {print}' file
输出:
arn ec2 us-east-1 XXXXXX标签app-name appname1arn ec2 us-east-1 XXXXXX标签app-name appname2arn ec2 us-east-1 XXXXXXTAGS app-name appname1
参见:The Stack Overflow Regular Expressions FAQ
使用默认字段分隔符的awk
想法和split()
:
awk '
/RESOURCETAGMAPPINGLIST/ { split($2,a,":") # split 2nd field on ":" delimiter, storing results in array a[]
print a[1],a[3],a[4],a[5]
next # skip to next line of inpu
}
1 # print current line
' sample.dat
# or as a one-liner sans comments:
awk '/RESOURCETAGMAPPINGLIST/ {split($2,a,":"); print a[1],a[3],a[4],a[5]; next} 1' sample.dat
由此产生:
arn ec2 us-east-1 XXXXXX
TAGS app-name appname1
arn ec2 us-east-1 XXXXXX
TAGS app-name appname2
arn ec2 us-east-1 XXXXXX
TAGS app-name appname1
有一种方法:
#!/usr/bin/perl
use v5.30; # Perl v5.10 or above is required to use 'say' instead of 'print'
use warnings;
my $file = "datafile.txt"; #declare file name
open( my $fh, "<", $file ) or die( "Can't open `$file`: $!n" );
#open the file with file handle $fh
while (my $line = <$fh>){ #read the file line by line
chomp $line; #remove line breaks
if ($line =~ m/^RESOURCETAGMAPPINGLIST/){ #if line start with
my @fields = split(/t|:/, $line); #split in tab or : characters; each field becomes an element of @fields array
say (join ' ', (@fields[1,2,3,4])); #join the relevant fields with a single space and print
}else{
say $line; #if line does not start with RESOURCETAGMAPPINGLIST, simply print it
}
}
#when script finishes, Perl will automatically close the input file.
我假设保留字段1,3,4,5与冒号分隔的部分。
perl -wple'
s{^RESOURCETAGMAPPINGLISTs+(.+)}
{ join " ", ( $1 =~ /([^:]+)/g )[0,2..4] }e' file
对于-p
,变量$_
(包含要处理的行)在处理后打印。
如果关键字不匹配,regex不做任何事情,$_
保持不变。如果匹配,则整行都匹配,因此$_
将被替换为替换侧的返回代码。(使用/e
修饰符,替换侧被计算为代码,它提取$1
捕获的行其余部分以:
分隔的选定词,并用空格将它们连接起来。)
或者:test for the word,然后分行并连接所需的部分,或者按原样打印
perl -wnlE'say
/^RESOURCETAGMAPPINGLIST/ ? join " ", (split /s+|:/)[1,3..5] : $_' file
这些被分成几行,这样更容易阅读。它们可以原样复制粘贴(在bash中),也可以放到一行中。或者,这可以在文件中编写得更好,但问题似乎是要求使用命令行程序。
这可能适合您(GNU sed):
sed -E 's/^RESOURCETAGMAPPINGLIST *([^:]+):[^:]+:([^:]+):([^:]+):([^:]+).*/1 2 3 4/' file
模式匹配并根据需要重新格式化