小贝子编程

有没有一种Python方法可以将Unicode字符串截断最大字节数

本文关键字：字符串 Unicode 字节数一种方法 Python 有没有 python python-3.x unicode
更新时间 : 2023-09-20
英文 : Is there a Pythonic way of truncating a Unicode string by a maximum number of bytes?

如果API接受某个限制字节数的字符串值，但接受Unicode，是否有更好的方法来缩短有效Unicode的字符串？

def truncate(string: str, length: int):
"""Shorten an Unicode string to a certain length of bytes."""
if len(string.encode()) <= length:
return string
chars = list(string)
while sum(len(char.encode()) for char in chars) > length:
chars.pop(-1)
return "".join(chars)

这应该在Python-3:中工作

bytes_ = string.encode()
try:
return bytes_[:length].decode()
except UnicodeDecodeError as err:
return bytes_[:err.start].decode()

基本上，我们在第一个解码错误时截断。UTF-8是一个前缀代码。因此，解码器应该总是能够看到字符串何时在字符中间被截断。口音和其他东西可能会引起怪异。我还没想清楚。也许我们也需要一些正常化。

在Python-2中，请确保指定编码。

有没有一种Python方法可以将Unicode字符串截断最大字节数

相关内容

最新更新

热门标签：