Unicode解码错误:'ascii'编解码器无法解码位置 0 中的字节0xe2:序号不在范围内(128)



我有这样的代码:

# -*- coding: utf-8 -*-
forbiddenWords=['for', 'and', 'nor', 'but', 'or', 'yet', 'so', 'not', 'a', 'the', 'an', 'of', 'in', 'to', 'for', 'with', 'on', 'at', 'from', 'by', 'about', 'as']

def IntoSentences(paragraph):
    paragraph = paragraph.replace("–", "-")
    import nltk.data
    sent_detector = nltk.data.load('tokenizers/punkt/english.pickle')
    sentenceList = sent_detector.tokenize(paragraph.strip())
    return sentenceList
from Tkinter import *
root = Tk()
var = StringVar()
label = Label( root, textvariable=var)
var.set("Fill in the caps: ")
label.pack()
text = Text(root)
text.pack()
button=Button(root, text ="Create text with caps.", command =lambda: IntoSentences(text.get(1.0,END)))
button.pack()
root.mainloop()

当我运行代码时一切正常。然后我插入文本并按下按钮。但是我得到了这个错误:

C:UsersIndrek>caps_main.py
Exception in Tkinter callback
Traceback (most recent call last):
  File "C:Python27liblib-tkTkinter.py", line 1470, in __call__
    return self.func(*args)
  File "C:Python27Myprojectscaps_main.py", line 25, in <lambda>
    button=Button(root, text ="Create text with caps.", command =lambda: IntoSen
tences(text.get(1.0,END)))
  File "C:Python27Myprojectscaps_main.py", line 7, in IntoSentences
    paragraph = paragraph.replace("ŌĆō", "-")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal
not in range(128)

如何解决这个问题?起初,当我试图运行代码时,我有同样的错误信息,然后我添加了lambda:现在,当我单击应用程序中的按钮时,问题出现了。

您必须将字符串解码为utf-8(或其他编码),然后将unicode字符串替换为其他内容。这段代码实现了您想要实现的目标:

paragraph = paragrah.decode('utf-8').replace(u'u014cu0106u014d','-')
# 'u014cu0106u014d' is the unicode representation of characters ŌĆō

相关内容

最新更新