我需要将文本主体中26个字母中每个字母的计数积累到字典中。当用户键入字母时,我需要在文本中显示该字母的频率。我该怎么做?
到目前为止,这是我的代码:
import urllib2
import numpy as py
import matplotlib
response = urllib2.urlopen('http://students.healthinformaticshub.ca/jane-austen-sense-n-sensibility.txt')
alphabet = 'abcdefghijklmnopqrstuvwxyz'
# initialize the dict we will use to store our
# counts for the individual vowels:
alphabet_counts = {'a': 0, 'b': 0, 'c': 0, 'd': 0, 'e': 0, 'f': 0, 'g': 0, 'h': 0,
'i': 0, 'j': 0, 'k': 0, 'l': 0, 'm': 0, 'n': 0, 'o': 0, 'p': 0, 'q': 0, 'r': 0, 's': 0,
't': 0, 'u': 0, 'v': 0, 'w': 0, 'x': 0, 'y': 0, 'z': 0}
total_letter_count = 0
# loop thru line by line:
for line in response:
line = line.lower()
for ch in line:
if ch in alphabet:
alphabet_counts[ch] += 1
total_letter_count += 1
print('# of a's: ' + str(alphabet_counts['a']))
print('# of b's: ' + str(alphabet_counts['b']))
print('# of c's: ' + str(alphabet_counts['c']))
print('# of d's: ' + str(alphabet_counts['d']))
print('# of e's: ' + str(alphabet_counts['e']))
print('# of f's: ' + str(alphabet_counts['f']))
print('# of g's: ' + str(alphabet_counts['g']))
print('# of h's: ' + str(alphabet_counts['h']))
print('# of i's: ' + str(alphabet_counts['i']))
print('# of j's: ' + str(alphabet_counts['j']))
print('# of k's: ' + str(alphabet_counts['k']))
print('# of l's: ' + str(alphabet_counts['l']))
print('# of m's: ' + str(alphabet_counts['m']))
print('# of n's: ' + str(alphabet_counts['n']))
print('# of o's: ' + str(alphabet_counts['o']))
print('# of p's: ' + str(alphabet_counts['p']))
print('# of q's: ' + str(alphabet_counts['q']))
print('# of r's: ' + str(alphabet_counts['r']))
print('# of s's: ' + str(alphabet_counts['s']))
print('# of t's: ' + str(alphabet_counts['t']))
print('# of u's: ' + str(alphabet_counts['u']))
print('# of v's: ' + str(alphabet_counts['v']))
print('# of w's: ' + str(alphabet_counts['w']))
print('# of x's: ' + str(alphabet_counts['x']))
print('# of y's: ' + str(alphabet_counts['y']))
print('# of z's: ' + str(alphabet_counts['z']))
resp = '''
1.) Find probability of a particular letter of the alphabet
2.) Show the barplot representing these probabilities for the entire alphabet
3.) Save that barplot as a png file
4.) Quit'''
我清理了您的一些代码,我还添加了一个有关如何使用raw_input
的示例(因为您根据使用Urllib2使用Python 2.7(:
import urllib2
import numpy as py
import matplotlib
response = urllib2.urlopen('http://students.healthinformaticshub.ca/jane-austen-sense-n-sensibility.txt')
alphabet = 'abcdefghijklmnopqrstuvwxyz'
# initialize the dict we will use to store our
# counts for the individual vowels:
alphabet_counts = {letter: 0 for letter in alphabet}
total_letter_count = 0
# loop thru line by line:
for line in response:
line = line.lower()
for ch in line:
if ch in alphabet:
alphabet_counts[ch] += 1
total_letter_count += 1
for letter in alphabet_counts:
print('# of ' + letter + ''s: ' + str(alphabet_counts[letter]))
letter = raw_input("Enter a character: ")
print('# of ' + letter + ''s: ' + str(alphabet_counts[letter]))
resp = '''
1.) Find probability of a particular letter of the alphabet
2.) Show the barplot representing these probabilities for the entire alphabet
3.) Save that barplot as a png file
4.) Quit'''
我不确定您真正想要什么,并且因为您只是在学习。我会为您提供提示,而不是答案。dict
对象具有称为items
和iteritems
的方法。获得键和值。要计算获得给定字符的概率,您可以使用IterItems:
char_probabilities = dict()
for character, count in alphabet_counts.iteritems():
# compute probability given the
# frequency of character her you can use the
# sum builtin and values method on the dict
char_probabilities[character] = [YOU DO SOME WORK]
这似乎是Counter
类的教科书使用:
import collections, urllib2, contextlib
url = 'http://students.healthinformaticshub.ca/jane-austen-sense-n-sensibility.txt'
alphabet_counts = collections.Counter()
with contextlib.closing(urllib2.urlopen(url)) as response:
for line in response:
alphabet_counts.update(x for x in line.lower() if x.isalpha())
计数器是dict
子类,其表现就像您的原始alphabet_counts
一样,但您的努力却少得多。请记住,您可能想关闭输入流,这就是为什么我使用with
块。
要获得频率,您需要知道计数器值的总和:
total_letters = sum(alphabet_counts.values())
frequencies = {letter: float(count) / total_letters for count, letter in alphabet_counts.iteritems()}