python对文件进行排序并计数UNIQ名称

我正在尝试读取Linux /var/log/messages中的日志文件，以使用我下面给出的具有特殊字符串模式的行。从此行模式中，我正在查看用户的电子邮件地址，例如rajeshm@noi-rajeshm.fox.com，并使用str.partition()方法将其分为两个部分作为列表索引，然后将第一个进一步拆分为列表为了易于获取最后一个索引值，这是用户ID，而且工作正常。

说我能够获取用户列表和总数，但是我需要计算每个用户的出现并打印user_name: Count，因此密钥和值。

Nov 28 09:00:08 FoxOpt210 RSHD [6157]：PAM_RHOSTS（RSH：AUTH）：允许访问rajeshm@noi-rajeshm.fox.com作为rajeshm

#!/usr/bin/python3
f= open("/var/log/messages")
count = 0
for line in f:
  if "allowed access"  in line:
    count+=1
    user_id = line.partition('@')[0]
    user_id = user_id.split()[-1]
    print(user_id)
f.close()
print("--------------------")
print("Total Count :" ,count)

当前代码如下：

bash-4.1$ ./log.py | tail
navit
akaul
akaul
pankaja
vishalm
vishalm
rajeshm
rajeshm
--------------------
Total Count : 790

在谷歌搜索时，我得到了为此使用字典的想法目的和它正常工作：

#!/usr/bin/python3
from collections import Counter
f= open("/var/log/messages")
count = 0
dictionary = {}
for line in f:
  if "allowed access"  in line:
    user_id = line.partition('@')[0]
    user_count = user_id.split()[-1]
    if user_count in dictionary:
        dictionary[user_count] += 1
    else:
       dictionary[user_count] = 1
for user_count, occurences in dictionary.items():
    print(user_count, ':', occurences)

，我的输出是根据的：

bash-4.1$ ./log2.py
rajeshm : 5
navit : 780
akaul : 2
pankaja : 1
vishalm : 2

我只是在寻找这项练习是否有更好的方法。

计数事物时，使用collections.Counter()类更容易。我会在这里封装在发电机中：

def users_accessed(fileobj):
    for line in fileobj:
        if 'allowed access' in line:
            yield line.partition('@')[0].rsplit(None, 1)[-1]

并将其传递给Counter()对象：

from collections import Counter
with open("/var/log/messages") as f:
    access_counts = Counter(users_accessed(f))
for userid, count in access_counts.most_common():
    print(userid, count, sep=':')

这使用Counter.most_common()方法提供排序的输出（最少最常见）。

您可以尝试使用正则表达式，可以做到这一点：

import re
pattern=r'(?<=ass)w.+'
occurrence={}
with open("/var/log/messages") as f:
    for line in f:
        search=re.search(pattern,line).group()
        if  search not in occurrence:
            occurrence[search]=1
        else:
            occurrence[search]=occurrence.get(search)+1
print(occurrence)

只是为了有趣的一行逻辑：

import re
pattern=r'(?<=ass)w.+'
new={}
[new.__setitem__(re.search(pattern, line).group(), 1) if re.search(pattern, line).group() not in new  else new.__setitem__(re.search(pattern, line).group(), new.get(re.search(pattern, line).group()) + 1) for line in open('legend.txt','r')]
print(new)

相关内容

最新更新

热门标签：