根据key-value对key=value中的值对bash中的列表进行排序



我有一个请求日志,如下所示:

[11/Jun/2020:15:35:20 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=72161.647 memory=2 cpu=0.01%
[11/Jun/2020:15:22:13 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=70564.992 memory=2 cpu=0.00%
[11/Jun/2020:15:35:26 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=70252.369 memory=2 cpu=0.00%
[11/Jun/2020:15:01:02 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=60159.409 memory=2 cpu=0.03%
[11/Jun/2020:14:59:03 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=106956.770 memory=2 cpu=0.01%
[11/Jun/2020:15:37:56 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=60014.014 memory=2 cpu=0.00%
[11/Jun/2020:16:45:38 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=61264.044 memory=2 cpu=0.02%
[11/Jun/2020:15:01:48 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=58733.325 memory=2 cpu=0.02%
[11/Jun/2020:15:31:35 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=68882.501 memory=2 cpu=0.03%
[11/Jun/2020:14:59:46 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=57021.375 memory=2 cpu=0.00%
[11/Jun/2020:14:59:46 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=137172.179 memory=2 cpu=0.01%
[11/Jun/2020:15:35:39 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=107954.112 memory=2 cpu=0.00%
[11/Jun/2020:16:12:22 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=55877.479 memory=2 cpu=0.02%
[11/Jun/2020:15:26:19 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=55912.678 memory=2 cpu=0.00%
[11/Jun/2020:15:36:33 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=54738.373 memory=2 cpu=0.02%

我有一个按时间、内存和cpu排序的脚本,但只有在排序前删除静态字符串time=,我才能做到这一点。

cat /var/log/requests.log | sed -e "s/time=//" | sort -k 7 -n -r | head -50

我得到

[11/Jun/2020:14:59:46 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX 137172.179 memory=2 cpu=0.01%
[11/Jun/2020:15:35:39 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX 107954.112 memory=2 cpu=0.00%
[11/Jun/2020:14:59:03 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX 106956.770 memory=2 cpu=0.01%
[11/Jun/2020:15:35:20 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX 72161.647 memory=2 cpu=0.01%
[11/Jun/2020:15:22:13 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX 70564.992 memory=2 cpu=0.00%
[11/Jun/2020:15:35:26 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX 70252.369 memory=2 cpu=0.00%
[11/Jun/2020:15:31:35 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX 68882.501 memory=2 cpu=0.03%
[11/Jun/2020:16:45:38 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX 61264.044 memory=2 cpu=0.02%
[11/Jun/2020:15:01:02 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX 60159.409 memory=2 cpu=0.03%
[11/Jun/2020:15:37:56 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX 60014.014 memory=2 cpu=0.00%
[11/Jun/2020:15:01:48 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX 58733.325 memory=2 cpu=0.02%
[11/Jun/2020:14:59:46 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX 57021.375 memory=2 cpu=0.00%
[11/Jun/2020:15:26:19 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX 55912.678 memory=2 cpu=0.00%
[11/Jun/2020:16:12:22 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX 55877.479 memory=2 cpu=0.02%
[11/Jun/2020:15:47:01 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX 55443.752 memory=2 cpu=0.02%

我想在不删除排序键的情况下对列表进行排序。

[11/Jun/2020:14:59:46 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=137172.179 memory=2 cpu=0.01%
[11/Jun/2020:15:35:39 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=107954.112 memory=2 cpu=0.00%
[11/Jun/2020:14:59:03 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=106956.770 memory=2 cpu=0.01%
[11/Jun/2020:15:35:20 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=72161.647 memory=2 cpu=0.01%
[11/Jun/2020:15:22:13 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=70564.992 memory=2 cpu=0.00%
[11/Jun/2020:15:35:26 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=70252.369 memory=2 cpu=0.00%
[11/Jun/2020:15:31:35 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=68882.501 memory=2 cpu=0.03%
[11/Jun/2020:16:45:38 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=61264.044 memory=2 cpu=0.02%
[11/Jun/2020:15:01:02 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=60159.409 memory=2 cpu=0.03%
[11/Jun/2020:15:37:56 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=60014.014 memory=2 cpu=0.00%
[11/Jun/2020:15:01:48 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=58733.325 memory=2 cpu=0.02%
[11/Jun/2020:14:59:46 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=57021.375 memory=2 cpu=0.00%
[11/Jun/2020:15:26:19 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=55912.678 memory=2 cpu=0.00%
[11/Jun/2020:16:12:22 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=55877.479 memory=2 cpu=0.02%
[11/Jun/2020:15:47:01 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=55443.752 memory=2 cpu=0.02%

我尝试过,但没有成功:

cat /var/log/requests.log | sort -k 7.6 -n -r | head -50

更新:/endpoint是真正的端点,那么它们可以包括查询字符串。更新2:我需要对key=value列中的任何一列进行排序(作为数字(。

如果您的输入具有适当的代表性,您可以简单地使用=作为列分隔符。

sort -t = -k3 -k4 -k5 -n -r /var/log/requests.log

还要注意我们如何避免无用的cat

更一般地说,您可以使用一个简单的Awk脚本来提取排序字段,并将它们放在第一位,然后对它们进行排序,然后丢弃它们(称为Schwartzian变换(。

awk '{ for(i=1; i<=NF; ++i) if ($i ~ /^(time|memory|cpu)=/) {
split($i, f, "="); a[f[1]] = substr($i, length(f[1])+2) }
print a["time"] "t" a["memory"] "t" a["cpu"] "t" $0 }' /var/log/requests.log |
sort -r -n |
cut -f4-

if语句提取出任何包含我们感兴趣的前缀的字段(例如,如果您愿意,可以在这里添加更多的键,或者如果您想提取字母顺序后包含等号的所有内容,则可以切换到更通用的正则表达式(,并用它们各自的值填充关联数组a。在遍历完所有字段后,我们将按照希望用于排序的顺序从数组中提取值。

演示:https://ideone.com/dU9v95

最新更新