如何在 Python 3 中解码以 "%u"(百分号+ u)开头的 unicode 字符串

我得到了一些HTML代码，如下所示：

<new>8003,%u767E%u5723%u5E97,113734,%u4E50%u4E8B%u542E%u6307%u7EA2%u70E7%u8089%u5473,6924743915824,%u7F50,104g,3,21.57,-2.16,0,%u4E50%u4E8B,1</new>

我知道我可以在Notepad++中找到所有的"%u"并将其替换为"/u"，然后将其粘贴到Python控制台中，使其以中文正确显示。但是我怎样才能在Python中自动完成呢？

假设您的输入字符串包含"percent-u"编码的chracter，我们可以使用regex替换和回调函数来查找和解码它们。

Percent-u编码将Unicode代码点表示为四个十六进制数字：%u767E⇒767E⇒代码点30334⇒百.

import re
def hex_to_char(hex_str):
""" converts a single hex-encoded character 'FFFF' into the corresponding real character """
return chr(int(hex_str, 16))
s = "<new>8003,%u767E%u5723%u5E97,113734,%u4E50%u4E8B%u542E%u6307%u7EA2%u70E7%u8089%u5473,6924743915824,%u7F50,104g,3,21.57,-2.16,0,%u4E50%u4E8B,1</new>"
percent_u = re.compile(r"%u([0-9a-fA-F]{4})")
decoded = percent_u.sub(lambda m: hex_to_char(m.group(1)), s)
print(decoded)

它打印

<new>8003,百圣店,113734,乐事吮指红烧肉味,6924743915824,罐,104g,3,21.57,-2.16,0,乐事,1</new>

相关内容

最新更新

热门标签：