删除匹配后的字符串以及匹配后的单词/string



我有一个包含以下模式行的文件。

date=2020-02-22 time=13:32:41 type=text subtype=text ip=1.2.3.4 country="China" service="foo"  id=47291 msg="foo: bar.baz," value=50
date=2020-03-17 time=11:49:54 type=text subtype=anothertext ip=1.2.3.5 country="Russian Federation" service="bar"  id=47324 msg="foo: bar.baz," value=30
date=2020-03-30 time=16:29:24 type=text subtype=someothertext ip=1.2.3.6 country="Korea, Republic of" service="grault, garply"  id=47448 msg="foo: bar.baz," value=60

我想删除类型、子类型和服务以及这些字段的值(=之后的值(。

期望输出:

date=2020-02-22 time=13:32:41 ip=1.2.3.4 country="China" id=47291 msg="foo: bar.baz," value=50
date=2020-03-17 time=11:49:54 ip=1.2.3.5 country="Russian Federation" id=47324 msg="foo: bar.baz," value=30
date=2020-03-30 time=16:29:24 ip=1.2.3.6 country="Korea, Republic of" id=47448 msg="foo: bar.baz," value=60

我一直在尝试使用cutawksed,但还并没有接近解决方案。我在网上搜索了好几个小时,但都白费了。有人能帮忙吗?

您以后可能想要重用或构建的东西:

$ cat tst.awk
BEGIN {
split(s,tmp)
for (i in tmp) {
skip[tmp[i]]
}
FPAT = "[^ ]+(="[^"]+")?"
}
{
c=0
for (i=1; i<=NF; i++) {
tag = gensub(/=.*/,"",1,$i)
if ( !(tag in skip) ) {
printf "%s%s", (c++ ? OFS : ""), $i
}
}
print ""
}
$ awk -v s='type subtype service' -f tst.awk file
date=2020-02-22 time=13:32:41 ip=1.2.3.4 country="China" id=47291 msg="foo: bar.baz," value=50
date=2020-03-17 time=11:49:54 ip=1.2.3.5 country="Russian Federation" id=47324 msg="foo: bar.baz," value=30
date=2020-03-30 time=16:29:24 ip=1.2.3.6 country="Korea, Republic of" id=47448 msg="foo: bar.baz," value=60

上面使用GNU awk作为FPAT和gensub((。

您可以使用此sed:

sed -E 's/(^|[[:blank:]]+)(subtype|type|service)=[^[:blank:]]+//g' file

date=2020-02-22 time=13:32:41 ip=1.2.3.4 country="China"  id=47291 msg="foo: bar.baz," value=50
date=2020-03-17 time=11:49:54 ip=1.2.3.5 country="Russian Federation"  id=47324 msg="foo: bar.baz," value=30
date=2020-03-30 time=16:29:24 ip=1.2.3.6 country="Korea, Republic of" garply"  id=47448 msg="foo: bar.baz," value=60

您可以尝试以下操作:

awk -F " " '{ $3=""; $4=""; $5="";  print}' file

您基本上将列设置为一个空字符串。

最新更新