Linux统计某个域每天发送的邮件数量

我有一些使用后缀代理的Linux SMTP服务器的日志。我想对日志执行一个操作，这样我就可以在不写脚本的情况下知道某个域每天发送多少封邮件。

例如，我的mail.log文件包含以下内容：

Jan  1 14:05:31 mail postfix/smtp[31349]: E6EC84105D: to=<john@example.com>, relay=http://mail.example.org[127.0.0.1]:25, delay=1.7, delays=0.22/0.05/0.36/1.1, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as 78B06EC0073)
Jan  1 15:05:00 mail postfix/smtp[31349]: E6EC84105D: to=<alice@example.com>, relay=http://mail.example.org[127.0.0.1]:25, delay=1.7, delays=0.22/0.05/0.36/1.1, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as 874BE4587C4)
Jan  1 15:05:00 mail postfix/smtp[31349]: E6EC84105D: to=<fred@example.com>, relay=http://mail.example2.org[127.0.0.1]:25, delay=1.7, delays=0.22/0.05/0.36/1.1, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as 98C484E1571)
Jan  2 10:08:15 mail postfix/smtp[31349]: E6EC84105D: to=<luke@example.com>, relay=http://mail.example.org[127.0.0.1]:25, delay=1.7, delays=0.22/0.05/0.36/1.1, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as 4456D154E12)
Jan  2 15:07:00 mail postfix/smtp[31349]: E6EC84105D: to=<tyson@example.com>, relay=http://mail.example2.org[127.0.0.1]:25, delay=1.7, delays=0.22/0.05/0.36/1.1, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as 4F54515C154)
Jan  2 14:59:11 mail postfix/smtp[31349]: E6EC84105D: to=<bob@example.com>, relay=http://mail.example2.org[127.0.0.1]:25, delay=1.7, delays=0.22/0.05/0.36/1.1, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as 9856C984E16)
Feb  1 13:14:35 mail postfix/smtp[31349]: E6EC84105D: to=<nick@example.com>, relay=http://mail.example.org[127.0.0.1]:25, delay=1.7, delays=0.22/0.05/0.36/1.1, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as EC1874415E8)

我想要的输出是：

-首先是邮件从发送的域/地址

-特定域每天发送的邮件数量(例如1月12日发送的邮件(

所以这里的输出应该是：

http://mail.example.org[127.0.0.1]:25
Jan 1 2
Jan 2 1
Feb 1 1
http://mail.example2.org[127.0.0.1]:25
Jan 1 1
Jan 2 2

现在我知道我有两个命令可以分别执行这些操作，但我真的不知道如何将它们组合在一起：

1.统计某个域总共发送了多少封邮件：

[user@linux ~] grep -h "status=sent" mail.log | cut -d' ' -f9 | awk '{c[$0] += 1} END {for(i in c){printf "%6s %4dn", i, c[i]}}' | sort -M
relay=http://mail.example2.org[127.0.0.1]:25,    3
relay=http://mail.example.org[127.0.0.1]:25,    4

2.统计每天发送的邮件数量

[user@linux ~]$ grep -h "status=sent" mail.log | cut -c-6 | awk '{c[$0] += 1} END {for(i in c){printf "%6s %4dn", i, c[i]}}' | sort -k2
Feb  1    1
Jan  1    3
Jan  2    3

有人知道一个好的命令可以帮助我完成这个特定的操作吗？任何帮助都将不胜感激，谢谢！

使用您显示的示例，请尝试以下awk代码。用GNUawk编写和测试应该可以与任何版本配合使用。

awk '
{
gsub(/^relay=|,$/,"",$8)
}
{
arr1[$1 OFS $2 OFS $8]++
}
END{
for(i in arr1){
split(i,arr2)
arr3[arr2[3]]=(arr3[arr2[3]]?arr3[arr2[3]] ORS:"") (arr2[1] OFS arr2[2] OFS arr2[4] OFS arr1[i])
}
for(i in arr3){
print i ORS arr3[i]
}
}
'  Input_file

解释：在awk的主程序中，首先在第7字段中用NULL全局替换启动继电器=AND结束,。然后创建一个名为arr1的数组，该数组的索引为$1 OFS $2 OFS $8，并在此处使用相同的索引1不断增加其计数，对Input_file的所有行执行此操作。然后在awk代码的END块中，遍历arr1所有元素，并将其索引i拆分为arr2。然后创建新数组arr3，该数组的索引为arr2的3元素，该元素是Input_file中的http值。并为arr2[1] OFS arr2[2] OFS arr2[4] OFS arr1[i]赋值。一旦在所有周期中创建了arr3，则通过for循环遍历其所有项目，并打印其索引，然后是ORS(新行(，然后是arr3的值(负责打印所需的输出(。

假设：

一行最多可以有一个字符串relay=的实例
relay=可能并不总是显示在相同的分隔字段中
给定域/地址的输出应该按日历顺序(在这种情况下，也应该是从mail.log读取日期的顺序(

添加不包括relay=:的几行

$ cat mail.log
Jan  1 14:05:31 mail postfix/smtp[31349]: E6EC84105D: to=<john@example.com>, relay=http://mail.example.org[127.0.0.1]:25, delay=1.7, delays=0.22/0.05/0.36/1.1, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as 78B06EC0073)
Jan  1 14:17:27 mail postfix/smtp[31349]: E6EC84105D: to=<john@example.com>, relay=http://mail.example.org[127.0.0.1]:25, delay=1.7, delays=0.22/0.05/0.36/1.1, dsn=2.0.0, status=rejected (250 2.0.0 Ok: queued as 78B06EC0073)
Jan  1 15:05:00 mail postfix/smtp[31349]: E6EC84105D: to=<alice@example.com>, relay=http://mail.example.org[127.0.0.1]:25, delay=1.7, delays=0.22/0.05/0.36/1.1, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as 874BE4587C4)
Jan  1 15:05:00 mail postfix/smtp[31349]: E6EC84105D: to=<fred@example.com>, relay=http://mail.example2.org[127.0.0.1]:25, delay=1.7, delays=0.22/0.05/0.36/1.1, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as 98C484E1571)
Jan  2 10:08:15 mail postfix/smtp[31349]: E6EC84105D: to=<luke@example.com>, relay=http://mail.example.org[127.0.0.1]:25, delay=1.7, delays=0.22/0.05/0.36/1.1, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as 4456D154E12)
Jan  2 12:13:31 mail postfix/smtp[31349]: E6EC84105D: to=<john@example.com>, relay=http://mail.example.org[127.0.0.1]:25, delay=1.7, delays=0.22/0.05/0.36/1.1, dsn=2.0.0, status=rejected (250 2.0.0 Ok: queued as 78B06EC0073)
Jan  2 15:07:00 mail postfix/smtp[31349]: E6EC84105D: to=<tyson@example.com>, relay=http://mail.example2.org[127.0.0.1]:25, delay=1.7, delays=0.22/0.05/0.36/1.1, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as 4F54515C154)
Jan  2 14:59:11 mail postfix/smtp[31349]: E6EC84105D: to=<bob@example.com>, relay=http://mail.example2.org[127.0.0.1]:25, delay=1.7, delays=0.22/0.05/0.36/1.1, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as 9856C984E16)
Feb  1 13:14:35 mail postfix/smtp[31349]: E6EC84105D: to=<nick@example.com>, relay=http://mail.example.org[127.0.0.1]:25, delay=1.7, delays=0.22/0.05/0.36/1.1, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as EC1874415E8)

使用GNU awk(用于阵列阵列(的一个想法：

awk '
BEGIN         { regex = "\<relay=[^, ]" }
/status=sent/ { date=$1 FS $2
addr=""
for (i=3;i<=NF;i++) {                     # loop through fields looking for "relay="
if ($i ~ regex) {                     # and if found then parse out the domain/address
split($i,arr,"=")
addr=arr[2]
gsub(",","",addr)
continue
}
}
if (addr != "") {                         # if we found an address then increment our counter
counts[addr][date]++
if (date != prevdate) {                # and keep track of the order in which dates have been processed
dates[++dtorder]=date
prevdate=date
}
}
}
END           { for (addr in counts) {
print addr
for (i=1;i<=dtorder;i++)              # loop through dates[] in the same order in which they were processed
if (dates[i] in counts[addr])
print dates[i],counts[addr][dates[i]]
}
}
' mail.log

注意：

for (addr in counts)不保证以任何特定顺序处理数组条目
CCD_ 17用于跟踪处理日期的顺序；这随后被用于CCD_ 18处理中，以确保我们以相同的顺序输出日期；这假设日期按日历顺序显示在mail.log中，从而无需计算如何按日历顺序对Jan、Feb、Mar等进行排序

这会生成：

http://mail.example2.org[127.0.0.1]:25
Jan 1 1
Jan 2 2
http://mail.example.org[127.0.0.1]:25
Jan 1 2
Jan 2 1
Feb 1 1

不是您想要的确切输出，但非常简单(使用GNU和BSDawk、sort和uniq测试(：

$ awk -F'=|,?[[:space:]]+' '{print $10,$1,$2}' mail.log | sort | uniq -c
1 http://mail.example.org[127.0.0.1]:25 Feb 1
2 http://mail.example.org[127.0.0.1]:25 Jan 1
1 http://mail.example.org[127.0.0.1]:25 Jan 2
1 http://mail.example2.org[127.0.0.1]:25 Jan 1
2 http://mail.example2.org[127.0.0.1]:25 Jan 2

awk字段分隔符由-F'=|,?[[:space:]]+'选项设置为=符号，或可选逗号，后跟至少一个空格(或制表符、表单提要…(。因此，您感兴趣的字段为数字10(原点(、1(月(和2(日(。sort | uniq -c对结果进行排序并打印，每个唯一输入一行，前面加上计数。

但月份的排序是按字母顺序排列的。如果您希望输出首先按来源排序，然后按增加日期排序，我们可以添加sort选项：

$ awk -F'=|,?[[:space:]]+' '{print $10,$1,$2}' mail.log | sort -k1,1 -k2,2M -k3,3 |
uniq -c
2 http://mail.example.org[127.0.0.1]:25 Jan 1
1 http://mail.example.org[127.0.0.1]:25 Jan 2
1 http://mail.example.org[127.0.0.1]:25 Feb 1
1 http://mail.example2.org[127.0.0.1]:25 Jan 1
2 http://mail.example2.org[127.0.0.1]:25 Jan 2

-k2,2M按日期而不是按字母顺序对第二个关键字的月份名称进行排序。最后，如果您想要显示的确切输出，我们可以添加最后一个awk脚本用于最终格式化：

$ awk -F'=|,?[[:space:]]+' '{print $10,$1,$2}' mail.log | sort -k1,1 -k2,2M -k3,3 |
uniq -c | awk '$2!=p {p=$2; print (NR!=1) ? "n" p : p} {print $3,$4,$1}'
http://mail.example.org[127.0.0.1]:25
Jan 1 2
Jan 2 1
Feb 1 1
http://mail.example2.org[127.0.0.1]:25
Jan 1 1
Jan 2 2

每次原点更改($2!=p(时，最后一个awk脚本都会将新原点存储在变量p中以供以后比较，打印新行(除了第一行，即(NR!=1) ? "n" p : p(，并打印新原点。对于每一行，它还打印月份($3(、日期($4(和计数($1(。

相关内容

最新更新

热门标签：