我遵循Wes McKinney的"Python for Data Analysis"中的示例。
在第二章中,我们被要求计算每个时区出现在"tz"位置的次数,其中一些条目没有"tz"。
McKinney的"America/New_York"计数结果为1251(如下所示,在前10/3440行中有2行),而我的计数结果为1。想知道为什么它显示'1'吗?
我使用Python 2.7,在Enthought (epd-7.3-1-win-x86_64.msi)的文本中安装在McKinney的指令中。数据来自https://github.com/Canuckish/pydata-book/tree/master/ch02。如果你不能从书名中看出我是Python的新手,那么请提供关于如何获得我没有提供的任何信息的说明。
import json
path = 'usagov_bitly_data2012-03-16-1331923249.txt'
open(path).readline()
records = [json.loads(line) for line in open(path)]
records[0]
records[1]
print records[0]['tz']
这里的最后一行将显示'America/New_York',记录[1]的模拟显示'America/Denver'
#count unique time zones rating movies
#NOTE: NOT every JSON entry has a tz, so first line won't work
time_zones = [rec['tz'] for rec in records]
time_zones = [rec['tz'] for rec in records if 'tz' in rec]
time_zones[:10]
显示前10个时区条目,其中8-10为空白…
#counting using a dict to store counts
def get_counts(sequence):
counts = {}
for x in sequence:
if x in counts:
counts[x] += 1
else:
counts[x] = 1
return counts
counts = get_counts(time_zones)
counts['America/New_York']
this = 1,但应该是1251
len(time_zones)
this = 3440
'America/New_York'
timezone出现1251
次:
import json
from collections import Counter
with open(path) as file:
c = Counter(json.loads(line).get('tz') for line in file)
print(c['America/New_York']) # -> 1251
不清楚为什么计数是1
为您的代码。可能代码缩进不正确:
def get_counts(sequence):
counts = {}
for x in sequence:
if x in counts:
counts[x] += 1
else: #XXX wrong indentation
counts[x] = 1 # it is run after the loop if there is no `break`
return counts
参见为什么python在for和while循环之后使用'else' ?
正确的缩进格式应为:
def get_counts(sequence):
counts = {}
for x in sequence:
if x in counts:
counts[x] += 1
else:
counts[x] = 1 # it is run every iteration if x not in counts
return counts
检查您没有将空格和制表符混合用于缩进,使用python -tt
运行脚本来查找