我正在尝试使用字典来计算:postrophe('(和连字符( - (的标点。我想看看是否可以使用list/dictionary/for loops and boolean表达式来实现此功能。这些标点只有在被任何其他字母包围时才能计算出来!例如。千斤顶箱(即3个连字符(,不应该(1个撇号(。这些字母可以是从A到Z的任何东西。另外,由于这是分配的一部分,因此不能使用模块/库。我没有想法,不知道该怎么办。任何帮助将不胜感激。
这是我尝试的:但是我得到了一个KeyError:0
def countpunc2():
filename = input("Name of file? ")
text = open(filename, "r").read()
text = text.lower() #make all the words lowercase (for our convenience)
for ch in '!"#$%&()*+./:<=>?@[\]^_`{|}~':
text = text.replace(ch, ' ')
for ch in '--':
text = text.replace(ch, ' ')
words = text.split('n') #splitting the text for words
wordlist = str(words)
count = {} #create dictionary; the keys/values are added on
punctuations = ",;'-"
letters = "abcdefghijklmnopqrstuvwxyz"
for i, char in enumerate(wordlist):
if i < 1:
continue
if i > len(wordlist) - 2:
continue
if char in punctuations:
if char not in count:
count[char] = 0
if count[i-1] in letters and count[i+1] in letters:
count[char] += 1
print(count)
更新:我将代码更改为:
def countpunc2():
filename = input("Name of file? ")
text = open(filename, "r").read()
text = text.lower() #make all the words lowercase (for our convenience)
for ch in '!"#$%&()*+./:<=>?@[\]^_`{|}~':
text = text.replace(ch, ' ')
for ch in '--':
text = text.replace(ch, ' ')
words = text.split('n') #splitting the text for words
wordlist = str(words)
count = {} #create dictionary; the keys/values are added on
punctuations = ",;'-"
letters = "abcdefghijklmnopqrstuvwxyz"
for i, char in enumerate(wordlist):
if i < 1:
continue
if i > len(wordlist) - 2:
continue
if char in punctuations:
if char not in count:
count[char] = 0
if wordlist[i-1] in letters and wordlist[i+1] in letters:
count[char] += 1
print(count)
当它给我输出时,这是不正确的。示例文件:https://www.dropbox.com/s/kqwvudflxnmldqr/sample1.txt?dl = 0预期的结果必须是:{',':27,' - ':10,';':5,"'":1}
我可能会比这更简单。
#!/usr/bin/env python3
sample = "I'd rather take a day off, it's hard work sitting down and writing a code. It's amazin' how some people find this so easy. Bunch of know-it-alls."
punc = "!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~"
letters = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
d = {}
for i, char in enumerate(sample):
if i < 1:
continue
if i > len(sample) - 2:
continue
if char in punc:
if char not in d:
d[char] = 0
if sample[i - 1] in letters and sample[i + 1] in letters:
d[char] += 1
print(d)
输出:
{"'": 3, ',': 0, '.': 0, '-': 2}
dunno在哪里得到";从。另外,您的逗号旁边有一个空间..所以它不算在这里..如果确实计算为字母变量增加一个空间。
解释发生的事情:
我们使用 enumerate
启动dict并在示例文本中读取示例文本,并按字符迭代字符,并使用 CC_2播放索引。如果它离末端太近或开始有资格,我们会跳过。
我在我们使用枚举的i
变量之前和之后检查字符。并添加计算是否有资格。
注意:尽管有Shebang,但此代码在Python2
您可以将输入字符串的字符映射到3个类别:字母(a(,标点符号(P(和空格(S(。然后将它们分成三倍(3个字符的序列(。从中隔离A-P-A三元组并计算不同标点字符的数量。
例如:
string="""jack-in-a-box (that is 3 hyphens) and shouldn't (1 apostrophe)."""
categ = [ "pa"[c.isalpha()] if c != " " else "s" for c in string ]
triples = [ triple for triple in zip(categ,categ[1:],categ[2:]) ]
pChars = [ p for p,triple in zip(s[1:],triples) if triple==("a","p","a") ]
result = { p:pChars.count(p) for p in set(pChars) }
print(result) # {"'": 1, '-': 3}
如果您不允许使用isAlpha()
或zip()
,则可以使用in
操作员和for
循环进行编码。
这是一个以非常拼写的方式进行的示例:
end_cap_characters = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z']
special_characters = [";", ":", "'", "-", ","]
def count_special_characters(in_string):
result = {}
for i in range(1, len(in_string) - 1):
if in_string[i - 1] in end_cap_characters:
if in_string[i + 1] in end_cap_characters:
if in_string[i] in special_characters:
if in_string[i] not in result:
result[in_string[i]] = 1
else:
result[in_string[i]] +=1
return result
print(count_special_characters("jack-in-the-box"))
print(count_special_characters("shouldn't"))
print(count_special_characters("jack-in-the-box, shouldn't and a comma that works,is that one"))
输出:
{'-': 3}
{"'": 1}
{'-': 3, "'": 1, ',': 1}
显然可以凝结,但我会把它作为您的练习;(。
update
基于您编辑的问题和已发布的代码,您需要更新以下行:
if count[i-1] in letters and count[i+1] in letters:
to:
if wordlist[i-1] in letters and wordlist[i+1] in letters: