python:unicode函数vs u前缀

我在Django项目中遇到了UnicodeEncodeError的问题，并通过更改中出现故障的__unicode__方法的返回值（在经历了很多挫折之后）解决了这个问题

return unicode("<span><b>{0}</b>{1}<span>".format(val_str, self.text))

至

return u"<span><b>{0}</b>{1}<span>".format(val_str, self.text)

但我很困惑为什么这会奏效（或者更确切地说，为什么一开始就有问题）。u前缀和unicode函数不做同样的事情吗？当在控制台中尝试时，他们似乎给出了相同的结果：

# with the function
test = unicode("<span><b>{0}</b>{1}<span>".format(2,4))
>>> test
u'<span><b>2</b>4<span>'
>>> type(test)
<type 'unicode'>
# with the prefix
test = u"<span><b>{0}</b>{1}<span>".format(2,4)
>>> test
u'<span><b>2</b>4<span>'
>>> type(test)
<type 'unicode'>

但似乎编码的方式有所不同，这取决于所使用的内容。这是怎么回事？

您的问题在于您将unicode()应用于；您的两个表达式是而不是等价的。

unicode("<span><b>{0}</b>{1}<span>".format(val_str, self.text))

将unicode()应用于的结果

"<span><b>{0}</b>{1}<span>".format(val_str, self.text)

而

u"<span><b>{0}</b>{1}<span>".format(val_str, self.text)

相当于：

unicode("<span><b>{0}</b>{1}<span>").format(val_str, self.text)

注意右括号的位置！

因此，您的第一个版本首先格式化，然后只有将格式化结果转换为unicode。这是一个重要的区别！

当将str.format()与unicode值一起使用时，这些值将传递给str()，后者将这些字符串隐式编码为ASCII。这会导致您的异常：

>>> 'str format: {}'.format(u'unicode ascii-range value works')
'str format: unicode ascii-range value works'
>>> 'str format: {}'.format(u"unicode latin-range value doesn't work: å")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'xe5' in position 40: ordinal not in range(128)

根据结果调用unicode()并不重要；已引发异常。

另一方面，用unicode.format()格式化没有这样的问题：

>>> u'str format: {}'.format(u'unicode lating-range value works: å')
u'str format: unicode lating-range value works: xe5'

相关内容

最新更新

热门标签：