null字节错误,可能希望删除无读,写,重新阅读python的null字节



我正在使用python 3.

我正在将CSV文件阅读到Dictreader中,并试图查看哪个国家的发生最多。

请注意,我使用的是Dictreader,而不是读者。我认为这是需要的,因为我正在使用计数器。

我遇到了麻烦,因为我的CSV文件中的某些行具有null字节(尤其是在密码字段中),这会杀死我的脚本,因为CSV读取器不喜欢null字节。一个例子是在下面的评论中的最后一个示例行中。我已经看到有些人在我的代码中删除了用该行的零字节: readerobject(x.replace('', '') for x in csvfile),但是我似乎无法使用它,因为我已经将csvfile读取到了pread行上的readerObject中。

这是我的代码

'''
sample csv lines
Brazil,200.145.23.13,pi,raspberry,failed,None,None,None
Brazil,200.145.23.13,pi,raspberryraspberry993311,failed,None,None,None
China,121.201.83.134,root,123456,succeeded,None,None,None
United Kingdom,185.38.148.238,root,123456,succeeded,None,None,None
Croatia,5.188.10.141,root,admin,succeeded,None,None,None
France,195.154.44.31,squid,123456,failed,None,None,None
France,195.154.44.31,squid,123456,failed,None,None,None
Croatia,5.188.10.141,root,123456,succeeded,None,None,None
Croatia,5.188.10.141,root,admin,succeeded,None,None,None
Croatia,5.188.10.141,root,123456,succeeded,None,None,None
Netherlands,109.236.91.85,root,admin,succeeded,None,None,None
France,51.255.160.205,root,admin,succeeded,None,None,None
United States,207.138.132.44,root,seiko2005,failed,None,None,None
France,212.83.150.189,support,"       ",failed,None,None,None   <-- these are null bytes inside the ""  
'''

import codecs
from pprint import pprint  
from collections import Counter
import csv
linecount = 0
import time
country_counter = Counter()
print("parsing CSV log file")
with open('C:/Users/Home/Documents/kippo stuff/final lab/kippo/oldkippo4final.csv', newline='') as csvfile:
    readerobject = csv.DictReader(csvfile, delimiter=',', fieldnames=['Country', 'IP Address', 'Username', 'Password', 'Status', 'name', 'intention', 'OS'])
    readerobject(x.replace('', '') for x in csvfile)
    for row in readerobject:
        print(row, "nn")
        linecount +=1
        country_counter[row['Country']] +=1
        print(linecount)
print(country_counter.most_common(3))
print("the total linecount was: ", linecount)

您可以读取整个文件,替换0x00字符,然后将io.StringIO对象从该新字符串传递到csv.DictReader。虽然这似乎有点骇客。也许csv模块中还有另一个功能,可以直接从字符串而不是文件对象读取?

import io
with open("oldkippo4final.csv", newline='') as csvfile:
    data = csvfile.read()   # read all
data = data.replace("", "")      # get rid of null-bytes
csvfile_repl = io.StringIO(data)   # new pseudo file object
readerobject = csv.DictReader(csvfile_repl, delimiter=',', fieldnames=['Country', 'IP Address', 'Username', 'Password', 'Status', 'name', 'intention', 'OS'])

最新更新