如何通过使用 u或 u逃脱字符串中的python3(3.6.1 )在Python3(3.6.1 )中正确表示补充Un

最近我正在学习python，并且在python中遇到了unicode逃生的问题。

看来，像java一样， u逃脱被解释为Java使用的UTF-16代码点，但这里有问题：

例如，如果我尝试放置3个字节UTF-8 char，例如"♬"(https://unicode-table.com/en/266c/(，甚至是补充Unicode Char，喜欢"𠜎"(https：https：https：https：https：//unicode-table.com/en/2070e/(按照 uxxxx或 uxxxxxxxx的格式，如下所示：

print('u00E2u99AC')  # UTF-8, messy code for sure
print('U00E299AC')    # UTF-8, with 8 bytes U, (unicode error) for sure
print('u266C')        # UTF-16 BE, music note appeares
# from which I suppose u and U function the same way they should do in Java
# (may be a little different since they function like macro in Java, and can be useed in comments)
# However, while print('u266C') gives me '♬'，'u266C' == '♬' is equal to false
# which is true in Java semantics.
# Further more, print('UD841DF0E') didn't give me '𠜎' : (unicode error) 'unicodeescape' codec can't decode bytes in position 0-9: illegal Unicode character
# which I suppose it should be, so it appears to me that I may get it wrong
# Here again : print('uD841uDF0E')  # Error, 'utf-8' codec can't encode characters in position 0-1: surrogates not allowed
print('xD8x41xDFx0E')  # also tried this, messy code
# maybe UTF-16 LE?
print('u41D8u0EDF')  # messy code
print('U41D80EDF')  # error

所以，我可以看到python"不支持补充逃生字面"，其行为也很奇怪。

好吧，我已经知道解码和编码此类字符的正确方法：

s_decoded = '\xe2\x99\xac'.encode().decode('unicode-escape')
               .encode('latin-1').decode('utf-8')
print(b'xf0xa0x9cx8e'.decode('utf-8'))
print(b'xd8x41xdfx0e'.decode('utf-16 be'))
assert s_decoded == '♬'

，但仍然没有使用 u＆amp;你逃脱字面。希望有人能指出我在做什么错以及它与爪哇的方式有何不同，谢谢！

顺便说一句，我的环境是pycharm win，python 3.6.1，源代码被编码为UTF-8

python 3.6.3：

>>> print('u266c') # U+266C
♬
>>> print('U0002070E') # U+2070E.  Python is not Java
𠜎
>>> 'u266c' == '♬'
True
>>> 'U0002070E' == '𠜎'
True

相关内容

最新更新

热门标签：