如何计算代码中关键字的出现次数,但忽略注释/文档字符串中的关键字



我是Python的新手。我想在下面的代码中找到Python关键字['def','in', 'if'...]的出现次数。但是,在代码中的任何字符串常量中找到的关键字都需要被忽略。如何在不计算字符串中的关键字的情况下计算关键字的出现次数?

def grade(result):
    '''
    if if (<--- example to test if the word "if" will be ignored in the counts)
    :param result: none
    :return:none
    '''
    if result >= 80:
        grade = "HD"
    elif 70 <= result:
        grade = "DI"
    elif 60 <= result:
        grade = "CR"
    elif 50 <= result:
        grade = "PA"
    else:
    #else (ignore this word)
        grade = "NN"
    return grade
result = float(raw_input("Enter a final result: "))
while result < 0 or result > 100:
    print "Invalid result. Result must be between 0 and 100."
    result = float(raw_input("Re-enter final result: "))
print "The corresponding grade is", grade(result)

使用tokenizekeywordcollections模块

tokenize.generate_tokens(readline)

generate_tokens()生成器需要一个参数readline,它必须是一个可调用对象提供与内置文件的readline()方法相同的接口对象(参见文件对象部分)。函数的每次调用都应该以字符串形式返回一行输入。或者,readline可以是可调用的对象,通过引发StopIteration来表示完成。

生成器生成包含以下成员的5元组:令牌类型;标记字符串;由指定行的int组成的2元组(row, scol)和标记在源中开始的列;一个二元组,指定令牌结束的行和列的整数的Ecol)源;还有找到记号的那条线。队伍过去了(元组的最后一项)是逻辑行;延长线是包括在内。

2.2新版功能。

import tokenize
with open('source.py') as f:
    print list(tokenize.generate_tokens(f.readline))
部分输出:

[(1, 'def', (1, 0), (1, 3), 'def grade(result):n'),
 (1, 'grade', (1, 4), (1, 9), 'def grade(result):n'),
 (51, '(', (1, 9), (1, 10), 'def grade(result):n'),
 (1, 'result', (1, 10), (1, 16), 'def grade(result):n'),
 (51, ')', (1, 16), (1, 17), 'def grade(result):n'),
 (51, ':', (1, 17), (1, 18), 'def grade(result):n'),
 (4, 'n', (1, 18), (1, 19), 'def grade(result):n'),
 (5, '    ', (2, 0), (2, 4), "    '''n"),
 (3,
  ''''n    if if (<--- example to test if the word "if" will be ignored in the counts)n    :param result: nonen    :return:nonen    '''',
  (2, 4),
  (6, 7),
  '    '''n    if if (<--- example to test if the word "if" will be ignored in the counts)n    :param result: nonen    :return:nonen    '''n'),
 (4, 'n', (6, 7), (6, 8), "    '''n"),
 (54, 'n', (7, 0), (7, 1), 'n'),
 (1, 'if', (8, 4), (8, 6), '    if result >= 80:n'),

您可以从keyword模块检索关键字列表:

import keyword
print keyword.kwlist
print keyword.iskeyword('def')

集成了集合的解决方案。柜台:

import tokenize
import keyword
import collections 
with open('source.py') as f:
    # tokens is lazy generator
    tokens = (token for _, token, _, _, _ in tokenize.generate_tokens(f.readline))
    c = collections.Counter(token for token in tokens if keyword.iskeyword(token))
print c  # Counter({'elif': 3, 'print': 2, 'return': 1, 'else': 1, 'while': 1, 'or': 1, 'def': 1, 'if': 1})

最新更新