使用bash计算字符串的出现次数

我需要使用bash计算日志文件中字符串出现的次数，并在字符串重复超过5次时执行命令。

我从日志文件中获得了以下示例数据:

[10:35:56] world_log_event: kick (starrr)(NormieBL)@Arca from srv 192.168.1.6(21)  
[10:39:17] world_log_data: user (chrisxJ02)(Delaon)@Arca is already connected on srv 7
[10:39:23] world_log_event: kick (chrisxJ02)(Delaon)@Arca from srv 192.168.1.39(7)
[10:39:17] world_log_data: user (test01)(testDW)@Arca is already connected on srv 39

脚本应该如何表现的一些例子:

if string "is already connected on srv 21" count is =>5 times then "exec command telnet 192.168.1.6"
if string "is already connected on srv 7" count is =>5 times then "exec command telnet 192.168.1.39"

OP评论的假设和集合:

用户提供要搜索的字符串(例如，Lorem ipsum dolor sit amet)
是这个字符串后面的字段(例如，21)，又名##;示例数据显示，这将始终是单个字段
记录每种##在输入
用户为报告的匹配数量提供阈值(例如，OP在问题中提到了5)，即threshold

IPaddr_

跟踪IPaddr_?字段和相关的##(例如，IPaddr_?(##))
输入

IPAddr_?(##)

每个OP的(更新的)样本数据IPaddr_?(##)条目是总是最后一个字段用(白)空格分隔输入行;为了完整起见，我们假设在

IPaddr_?(##)

如果##有多个匹配的IPaddr_?记录[注:OP已经声明这种情况不会发生]，建议的awk解决方案(如下)将报告最后IPaddr_?从输入
在处理结束时，如果##至少出现threshold次，则打印##和相关的IPaddr_?;OP没有提供所需的输出格式，所以现在我们假设## IPaddr_?足以让调用进程解析

用户设置的输入参数:

search_string='Lorem ipsum dolor sit amet'
threshold=5

一个awk的想法:

awk -v ss="${search_string}" -v threshold="${threshold}" '
$0 ~ ss    { counter[$NF]++ }                    # counter[##]++
/ IPaddr_/ { for (i=2; i<=NF; i++)               # loop through fields ...
if ($(i) ~ "IPaddr_") {         # looking for string "IPaddr_"
split($(i),arr,"[()]")       # split "IPaddr_?(##)" on parens
ip[arr[2]]=arr[1]            # ip[##]=IPaddr_?
next}                        # skip to next input line
}
END        { for (i in counter)                  # for every "##" encountered ...
if (counter[i] >= threshold)    # if the count is greater than threshold then ...
print i,ip[i]                # print "## IPaddr_?"
}
' ipsum.log

使用OP的示例输入生成:

21 IPaddr_B

对于threshold=3，生成:

21 IPaddr_B

对于threshold=2，生成:

17 IPaddr_A
21 IPaddr_B
22 IPaddr_C

计算出现次数的简单方法是使用grep -c 'string' file。因此，在您的情况下，您可以在复合命令中使用命令替换，并执行:

[ "$(grep -c 'Lorem ipsum dolor sit amet 21' f)" -gt 5 ] && 
echo "execute cmd" || 
echo "no cmd"

上面检查"Lorem ipsum dolor sit amet 21"是否出现-gt(大于)5次，如果出现，则是echo "execute cmd"，否则是echo "no cmd"。如果你喜欢，你可以把它变成一个if ... then ... else ... fi形式。

(注意:形式[ test ] && do this || do that不是if ... then ... else ... fi的真正替代品，因为如果测试为真并且do this失败，则将执行do that。然而，在do this是echo "..."的情况下，这不是一个真正的问题)

使用/输出示例

在文件f中输入，您将有:

$ [ "$(grep -c 'Lorem ipsum dolor sit amet 21' f)" -gt 5 ] &&
> echo "execute cmd" ||
> echo "no cmd"
execute cmd

这个呢:

cut -c 12- try.txt | grep "Lorem ipsum" | sort | uniq -c | awk '{if ($1>=5) print $NF}'
cut -c 12-                   : read every line, starting from after the 12th character
grep "lorem ipsum"           : filter by content
sort | uniq -c               : it's a trick: first you sort, then you take the unique 
values and count them, the result looks like
"2 lorem ... 17", or:
"lorem ..." with value 17 appears two times
awk '{if ($1>=5) print $NF}' : only when the first value (the unique -c count) is 
at most 5 the last field (NF means "number of fields")
is shown.

相关内容

最新更新

热门标签：