查找包含字符串的行并使用shell脚本将值回显到新行



我想在bash脚本中找到解决方案:我有原始输出日志。每一行都以日期开头,例如4月10日11:17:35

我想循环浏览每个日志项,并找到包含字符串coderbyte-heroku/router的行。对于其中的每一个,将request_id值回显到一个新行,如果fwd键的值为MASKED,则在该行的末尾添加一个[M],其前面有一个空格

输出日志

Apr 10 11:17:35 coderbyte app/web.3: IP_MASKED - - [10/Apr/2020:18:17:35 +0000] "GET /backend/requests/editor/placeholder?shareLinkId=69dff0hba0nv HTTP/1.1" 200 148 "https://coderbyte.com" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:74.0) Gecko/20100101 Firefox/74.0
Apr 10 11:17:35 coderbyte heroku/router: at=info method=GET path="/backend/requests/editor/placeholder?key=s2fwad2Es2" host=coderbyte.com request_id=b19a87a1-1bbb-4e67-b207-bd9f23d46afa fwd="108.31.000.000" dyno=web.3 connect=0ms service=92ms status=200 bytes=3194 protocol=https
Apr 10 11:17:35 coderbyte heroku/router: at=info method=GET path="/backend/requests/editor/placeholder?shareLinkId=tosrve4v8q8q" host=coderbyte.com request_id=910b07d1-3f71-4347-a1a7-bfa20384ef65 fwd="108.31.000.000" dyno=web.2 connect=1ms service=17ms status=200 bytes=4435 protocol=https
Apr 10 11:17:35 coderbyte heroku/router: at=info method=GET path="/backend/requests/editor/placeholder?shareLinkId=tosrve4v8q8q" host=coderbyte.com request_id=097bf65e-e189-4f9f-9dfb-4758cff411b2 fwd="108.31.000.000" dyno=web.3 connect=1ms service=10ms status=200 bytes=4435 protocol=https
Apr 10 11:17:35 coderbyte app/web.2: IP_MASKED - - [10/Apr/2020:18:17:35 +0000] "GET /backend/requests/editor/placeholder?key=s2fwad2Es2 HTTP/1.1" 200 4263 "https://coderbyte.com" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36
Apr 10 11:17:35 coderbyte heroku/router: at=info method=GET path="/backend/requests/editor/placeholder?shareLinkId=4eiramcmayu0" host=coderbyte.com request_id=d48278c2-5731-464e-be38-ab9ad84ac4a8 fwd="108.31.000.000" dyno=web.4 connect=1ms service=7ms status=200 bytes=3194 protocol=https
Apr 10 11:17:35 coderbyte app/web.3: IP_MASKED - - [10/Apr/2020:18:17:35 +0000] "GET /backend/requests/editor/placeholder?shareLinkId=tosrve4v8q8q HTTP/1.1" 200 4263 "https://coderbyte.com" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36
Apr 10 11:17:35 coderbyte app/web.3: IP_MASKED - - [10/Apr/2020:18:17:35 +0000] "GET /backend/requests/editor/placeholder?shareLinkId=tosrve4v8q8q HTTP/1.1" 200 4263 "https://coderbyte.com" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36
Apr 10 11:17:36 coderbyte app/web.4: IP_MASKED - - [10/Apr/2020:18:17:35 +0000] "GET /backend/requests/editor/placeholder?shareLinkId=4eiramcmayu0 HTTP/1.1" 200 3023 "https://coderbyte.com" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36
Apr 10 11:17:36 coderbyte heroku/router: at=info method=GET path="/backend/requests/editor/placeholder?shareLinkId=tosrve4v8q8q" host=coderbyte.com request_id=8bb2413c-3c67-4180-8091-000313b8d9ca fwd="MASKED" dyno=web.3 connect=1ms service=32ms status=200 bytes=4435 protocol=https
Apr 10 11:17:36 coderbyte heroku/router: at=info method=GET path="/backend/requests/editor/placeholder?shareLinkId=tosrve4v8q8q" host=coderbyte.com request_id=10f93da3-2753-48a3-9485-857a93d8a88a fwd="MASKED" dyno=web.3 connect=1ms service=37ms status=200 bytes=4435 protocol=https

这是我的代码

#!/bin/bash
curl -s https://coderbyte.com/api/challenges/logs/web-logs-raw -O > /dev/null
#cat web-logs-raw
grep -F "coderbyte heroku/router" web-logs-raw >> test
cat test

到目前为止,这可以过滤日志并找到包含字符串coderbyte-heroku/router的行。但是,我如何将request_id值回显到新行,如果fwd键的值为MASKED,则在行的末尾添加一个[M],在其前面加一个空格。

输出应该像这个

b19a87a1-1bbb-000-00000
b19a87a1-1bbb-000-11111
8bb2413c-3c67-4180-22222 [M]
10f93da3-2753-48a3-33333 [M]

一行:

awk '/coderbyte heroku/router/ { split($10,map,"=");id=map[2];split($11,map1,""");print map1[2]=="MASKED"?id" [M]":id }' web-logs-raw

说明:

awk '/coderbyte heroku/router/ { # Search for lines with required text
split($10,map,"="); # Split the 10th space delimited field into the array map using "=" as the field separator
id=map[2]; # Set the variable id to the the second index of the map array
split($11,map1,"""); # Split the 11th field into the array map1 using " as the field separator (this is the masked variable)
print map1[2]=="MASKED"?id" [M]":id # If the masked entry is MASKED, print "[M]" and then the id otherwise just print the id
}' web-logs-raw  

输出:

b19a87a1-1bbb-4e67-b207-bd9f23d46afa
910b07d1-3f71-4347-a1a7-bfa20384ef65
097bf65e-e189-4f9f-9dfb-4758cff411b2
d48278c2-5731-464e-be38-ab9ad84ac4a8
8bb2413c-3c67-4180-8091-000313b8d9ca [M]
10f93da3-2753-48a3-9485-857a93d8a88a [M]

另一个选项是使用模式并匹配coderbyte heroku/router,并在捕获组中捕获request_id和fwd="MASKED"

测试组2是否为空。如果不是,则写request_id,后跟[M],否则只写请求id。

使用gnu-awk和第三个参数来捕获组(如@anubhava所述(

awk '
match($0, /^.*?<coderbyte heroku/router>.* request_id=(S+) (fwd="MASKED)?/, m) {
print m[2]? m[1] " [M]": m[1]
}
' web-logs-raw

输出

b19a87a1-1bbb-4e67-b207-bd9f23d46afa
910b07d1-3f71-4347-a1a7-bfa20384ef65
097bf65e-e189-4f9f-9dfb-4758cff411b2
d48278c2-5731-464e-be38-ab9ad84ac4a8
8bb2413c-3c67-4180-8091-000313b8d9ca [M]
10f93da3-2753-48a3-9485-857a93d8a88a [M]

另一个可能很容易阅读的选项

cat web-logs-raw | grep "coderbyte heroku/router" | cut -d' ' -f10,11 | awk -F'=' '{print $2,$3} ' |
awk -F" '{print $1,$2}' | cut -d' ' -f1,4 | awk '{ if ( $2 == "MASKED") print $1" [M]"; else print $1 }'

说明:

cat web-logs-raw  | grep "coderbyte heroku/router" | cut -d' ' -f10,11
till this we find this word coderbyte heroku/router and print only request id and fwd fields
Output: 
request_id=b19a87a1-1bbb-4e67-b207-bd9f23d46afa fwd="108.31.000.000"
request_id=910b07d1-3f71-4347-a1a7-bfa20384ef65 fwd="108.31.000.000"
request_id=097bf65e-e189-4f9f-9dfb-4758cff411b2 fwd="108.31.000.000"
request_id=d48278c2-5731-464e-be38-ab9ad84ac4a8 fwd="108.31.000.000"
request_id=8bb2413c-3c67-4180-8091-000313b8d9ca fwd="MASKED"
request_id=10f93da3-2753-48a3-9485-857a93d8a88a fwd="MASKED"
then after this we need to remove requst_id field and fwd`s double questes
cat web-logs-raw | grep "coderbyte heroku/router" | cut -d' ' -f10,11 | awk -F'=' '{print $2,$3} ' |
awk -F" '{print $1,$2}'
output:
b19a87a1-1bbb-4e67-b207-bd9f23d46afa fwd  108.31.000.000
910b07d1-3f71-4347-a1a7-bfa20384ef65 fwd  108.31.000.000
097bf65e-e189-4f9f-9dfb-4758cff411b2 fwd  108.31.000.000
d48278c2-5731-464e-be38-ab9ad84ac4a8 fwd  108.31.000.000
8bb2413c-3c67-4180-8091-000313b8d9ca fwd  MASKED
10f93da3-2753-48a3-9485-857a93d8a88a fwd  MASKED
just after this now we only need to remove the fwd 
cat web-logs-raw | grep "coderbyte heroku/router" | cut -d' ' -f10,11 | awk -F'=' '{print $2,$3} ' |
awk -F" '{print $1,$2}' | cut -d' ' -f1,4 
output: 
b19a87a1-1bbb-4e67-b207-bd9f23d46afa 108.31.000.000
910b07d1-3f71-4347-a1a7-bfa20384ef65 108.31.000.000
097bf65e-e189-4f9f-9dfb-4758cff411b2 108.31.000.000
d48278c2-5731-464e-be38-ab9ad84ac4a8 108.31.000.000
8bb2413c-3c67-4180-8091-000313b8d9ca MASKED
10f93da3-2753-48a3-9485-857a93d8a88a MASKED
then we need to remove this fwd_key field 
cat web-logs-raw | grep "coderbyte heroku/router" | cut -d' ' -f10,11 | awk -F'=' '{print $2,$3} ' |
awk -F" '{print $1,$2}' | cut -d' ' -f1,4 | awk '{ if ( $2 == "MASKED") print $1" [M]"; else print $1 }'
output:
19a87a1-1bbb-4e67-b207-bd9f23d46afa
910b07d1-3f71-4347-a1a7-bfa20384ef65
097bf65e-e189-4f9f-9dfb-4758cff411b2
d48278c2-5731-464e-be38-ab9ad84ac4a8
8bb2413c-3c67-4180-8091-000313b8d9ca [M]
10f93da3-2753-48a3-9485-857a93d8a88a [M]

尝试Perl

$ perl -ne ' $m=/fwd="MASKED"/ ? "[M]":""; /coderbyte heroku/router/ and /request_id=(S+)/ and print "$1 $mn" ' output.log
b19a87a1-1bbb-4e67-b207-bd9f23d46afa
910b07d1-3f71-4347-a1a7-bfa20384ef65
097bf65e-e189-4f9f-9dfb-4758cff411b2
d48278c2-5731-464e-be38-ab9ad84ac4a8
8bb2413c-3c67-4180-8091-000313b8d9ca [M]
10f93da3-2753-48a3-9485-857a93d8a88a [M]
$ awk '/coderbyte heroku/router/{gsub(/request_id=/,"",$10); print $10, ($11=="fwd="MASKED"" ? "[M]" : "")}' web-logs-raw 
b19a87a1-1bbb-4e67-b207-bd9f23d46afa 
910b07d1-3f71-4347-a1a7-bfa20384ef65 
097bf65e-e189-4f9f-9dfb-4758cff411b2 
d48278c2-5731-464e-be38-ab9ad84ac4a8 
8bb2413c-3c67-4180-8091-000313b8d9ca [M]
10f93da3-2753-48a3-9485-857a93d8a88a [M]
$ awk -F'[ ="]' '/coderbyte heroku/router/{print $18, ($21=="MASKED" ? "[M]" : "")}' web-logs-raw
b19a87a1-1bbb-4e67-b207-bd9f23d46afa 
910b07d1-3f71-4347-a1a7-bfa20384ef65 
097bf65e-e189-4f9f-9dfb-4758cff411b2 
d48278c2-5731-464e-be38-ab9ad84ac4a8 
8bb2413c-3c67-4180-8091-000313b8d9ca [M]
10f93da3-2753-48a3-9485-857a93d8a88a [M]
cat web-logs-raw | grep -F "coderbyte heroku/router" | awk '{ print $10 " " $11 }' | xargs -d "=" | awk '{ print $2 " " $4}' | sed 's/"/ /g' | while read a b c; do if [[ "$b" == "MASKED" ]]; then echo $a [M]; else echo $a;fi;done
b19a87a1-1bbb-4e67-b207-bd9f23d46afa
910b07d1-3f71-4347-a1a7-bfa20384ef65
097bf65e-e189-4f9f-9dfb-4758cff411b2
d48278c2-5731-464e-be38-ab9ad84ac4a8
8bb2413c-3c67-4180-8091-000313b8d9ca [M]
10f93da3-2753-48a3-9485-857a93d8a88a [M]
mawk '(NF *= / coderbyte heroku/router:/) && 
($--NF = /MASKED/ ? " [M]" : _)^_' FS='^.+request_id=| [^"]+"|" .+$' OFS= 

b19a87a1-1bbb-4e67-b207-bd9f23d46afa
910b07d1-3f71-4347-a1a7-bfa20384ef65
097bf65e-e189-4f9f-9dfb-4758cff411b2
d48278c2-5731-464e-be38-ab9ad84ac4a8
8bb2413c-3c67-4180-8091-000313b8d9ca [M]
10f93da3-2753-48a3-9485-857a93d8a88a [M]

最新更新