第 2 章第 "Python for Data Analysis" 章中的示例

我遵循Wes McKinney的"Python for Data Analysis"中的示例。

在第二章中，我们被要求计算每个时区出现在"tz"位置的次数，其中一些条目没有"tz"。

McKinney的"America/New_York"计数结果为1251(如下所示，在前10/3440行中有2行)，而我的计数结果为1。想知道为什么它显示'1'吗?

我使用Python 2.7，在Enthought (epd-7.3-1-win-x86_64.msi)的文本中安装在McKinney的指令中。数据来自https://github.com/Canuckish/pydata-book/tree/master/ch02。如果你不能从书名中看出我是Python的新手，那么请提供关于如何获得我没有提供的任何信息的说明。

import json
path = 'usagov_bitly_data2012-03-16-1331923249.txt'
open(path).readline()
records = [json.loads(line) for line in open(path)]
records[0]
records[1]
print records[0]['tz']

这里的最后一行将显示'America/New_York'，记录[1]的模拟显示'America/Denver'

#count unique time zones rating movies
#NOTE: NOT every JSON entry has a tz, so first line won't work
time_zones = [rec['tz'] for rec in records]
time_zones = [rec['tz'] for rec in records if 'tz' in rec]
time_zones[:10]

显示前10个时区条目，其中8-10为空白…

#counting using a dict to store counts
def get_counts(sequence):
    counts = {}
        for x in sequence:
        if x in counts:
            counts[x] += 1
        else:
            counts[x] = 1
        return counts
counts = get_counts(time_zones)
counts['America/New_York']

this = 1，但应该是1251

len(time_zones)

this = 3440

'America/New_York' timezone出现1251次:

import json
from collections import Counter
with open(path) as file:
    c = Counter(json.loads(line).get('tz') for line in file)
print(c['America/New_York']) # -> 1251

不清楚为什么计数是1为您的代码。可能代码缩进不正确:

def get_counts(sequence):
    counts = {}
    for x in sequence:
        if x in counts:
            counts[x] += 1
    else: #XXX wrong indentation
        counts[x] = 1 # it is run after the loop if there is no `break` 
    return counts

参见为什么python在for和while循环之后使用'else' ?

正确的缩进格式应为:

def get_counts(sequence):
    counts = {}
    for x in sequence:
        if x in counts:
            counts[x] += 1
        else: 
            counts[x] = 1 # it is run every iteration if x not in counts
    return counts

检查您没有将空格和制表符混合用于缩进，使用python -tt运行脚本来查找

相关内容

最新更新

热门标签：