正确转换python字节字符串中的特殊字符

尝试浏览一些类似的线程，但仍然感到困惑：

我有一个带有一些特殊字符的字节字符串(在我的情况下是双引号(，如下所示。将其正确转换为字符串以便正确映射特殊字符的最简单方法是什么？

b = b'My groovy strxe2x80x9d is now fixed'

更新：关于解码('utf-8'(

>>> b = b'My groovy strxe2x80x9d is now fixed'
>>> b_converted = b.decode("utf-8") 
>>> b_converted
'My groovy stru201d is now fixed'
>>> print(b_converted)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character 'u201d' in position 13: ordinal not in range(128)

以下内容应该有效：

b_converted = b.decode("utf-8")

转换自：

b'My groovy strxe2x80x9d is now fixed'

收件人：

My groovy str” is now fixed

对字节字符串使用.decode(encoding)将其转换为Unicode。

编码并不总是可以确定的，并且取决于来源。在这种情况下，它显然是utf8。

理想情况下，当读取文本字符串时，用于读取数据的API可以指定编码，或者在网站请求的情况下，从响应头中检测到编码，因此您不需要显式地.decode，例如：

with open('input.txt',encoding='utf8') as file:
text = file.read()

或

import requests
response = requests.get('http://example.com')
print(response.encoding)
print(response.text) # translated from encoding

相关内容

最新更新

热门标签：