我正在使用python 3.
我正在将CSV文件阅读到Dictreader中,并试图查看哪个国家的发生最多。
请注意,我使用的是Dictreader,而不是读者。我认为这是需要的,因为我正在使用计数器。
我遇到了麻烦,因为我的CSV文件中的某些行具有null字节(尤其是在密码字段中),这会杀死我的脚本,因为CSV读取器不喜欢null字节。一个例子是在下面的评论中的最后一个示例行中。我已经看到有些人在我的代码中删除了用该行的零字节: readerobject(x.replace(' ', '') for x in csvfile)
,但是我似乎无法使用它,因为我已经将csvfile读取到了pread行上的readerObject中。
这是我的代码
'''
sample csv lines
Brazil,200.145.23.13,pi,raspberry,failed,None,None,None
Brazil,200.145.23.13,pi,raspberryraspberry993311,failed,None,None,None
China,121.201.83.134,root,123456,succeeded,None,None,None
United Kingdom,185.38.148.238,root,123456,succeeded,None,None,None
Croatia,5.188.10.141,root,admin,succeeded,None,None,None
France,195.154.44.31,squid,123456,failed,None,None,None
France,195.154.44.31,squid,123456,failed,None,None,None
Croatia,5.188.10.141,root,123456,succeeded,None,None,None
Croatia,5.188.10.141,root,admin,succeeded,None,None,None
Croatia,5.188.10.141,root,123456,succeeded,None,None,None
Netherlands,109.236.91.85,root,admin,succeeded,None,None,None
France,51.255.160.205,root,admin,succeeded,None,None,None
United States,207.138.132.44,root,seiko2005,failed,None,None,None
France,212.83.150.189,support," ",failed,None,None,None <-- these are null bytes inside the ""
'''
import codecs
from pprint import pprint
from collections import Counter
import csv
linecount = 0
import time
country_counter = Counter()
print("parsing CSV log file")
with open('C:/Users/Home/Documents/kippo stuff/final lab/kippo/oldkippo4final.csv', newline='') as csvfile:
readerobject = csv.DictReader(csvfile, delimiter=',', fieldnames=['Country', 'IP Address', 'Username', 'Password', 'Status', 'name', 'intention', 'OS'])
readerobject(x.replace(' ', '') for x in csvfile)
for row in readerobject:
print(row, "nn")
linecount +=1
country_counter[row['Country']] +=1
print(linecount)
print(country_counter.most_common(3))
print("the total linecount was: ", linecount)
您可以读取整个文件,替换0x00字符,然后将io.StringIO
对象从该新字符串传递到csv.DictReader
。虽然这似乎有点骇客。也许csv
模块中还有另一个功能,可以直接从字符串而不是文件对象读取?
import io
with open("oldkippo4final.csv", newline='') as csvfile:
data = csvfile.read() # read all
data = data.replace(" ", "") # get rid of null-bytes
csvfile_repl = io.StringIO(data) # new pseudo file object
readerobject = csv.DictReader(csvfile_repl, delimiter=',', fieldnames=['Country', 'IP Address', 'Username', 'Password', 'Status', 'name', 'intention', 'OS'])