分类和组织词典



我有一个字典,其中有许多键/值对。

键是日期,值是全球顶级域。

我想将字典输出到文本文件中,以便计数和alpha类似的值,但仅在同一键

for example:
*key:     value1:count value2:count*
date1:   au:4   be:12   com:44
date2:   az:4   com:14  net:5

代码:

with open('access_logshort.txt','rU') as f:
for line in f:
    list1 = re.search(r'(?P<Date>[0-9]{2}/[a-zA-Z]{3}/[0-9]{4})(.+)(GET|POST)s(http://|https://)([a-zA-Z.]+)(.)(?P<tld>[a-zA-Z]+)(/).+?"s200',line)
    if list1 != None:
        print list1.groupdict()
        one_tuple = list1.group(1,7)
        my_dict[one_tuple[0]]=one_tuple[1]
output:
print my_dict
{'09/Mar/2004': 'hu'}
{'09/Mar/2004': 'hu'}
{'09/Mar/2004': 'com'}
{'09/Mar/2004': 'ru'}
{'09/Mar/2004': 'ru'}
{'09/Mar/2004': 'com'}

t

这应该适合您的案件。

from collections import defaultdict
from dateutil.parser import parse
import csv
import re
data = defaultdict(lambda: defaultdict(int))
with open('access_logshort.txt','rU') as f:
    for line in f:
        list1 = re.search(r'(?P<Date>[0-9]{2}/[a-zA-Z]{3}/[0-9]{4})(.+)(GET|POST)s(http://|https://)([a-zA-Z.]+)(.)(?P<tld>[a-zA-Z]+)(/).+?"s200',line)
        if list1 is not None:
            date, domain = list1.group(1,7)
            data[date.lower()][domain.lower()] += 1
with open('my_data.csv', 'wb') as ofile:
    # add delimiter='t' to the argument list of csv.writer if you want 
    # tsv rather than csv
    writer = csv.writer(ofile)
    for key, value in sorted(data.iteritems(), key=lambda x: parse(x[0])):
        domains = sorted(value.iteritems())
        writer.writerow([key] + ['{}:{}'.format(*d) for d in domains])

输出:

10/Mar/2004,com:2,hu:2,ru:2
09/Mar/2004,com:2,hu:2,ru:2

最新更新