如何截断python中的字符串,包括在结果中截断的单词?



这里有一个很好的讨论,截断字符串而不以单词中间结束,关于如何在python中进行'智能'字符串截断。但是这里提出的解决方案的问题是,如果宽度限制在一个单词内,那么这个单词将被完全丢弃。

我如何在python中截断字符串设置'软'宽度限制,即如果限制落在单词的中间,那么这个单词被保留

?的例子:

str = "it's always sunny in philadelphia"
trunc(str, 7)
>>> it's always...

我最初的想法是将字符串切片到软限制,然后开始检查每个下一个字符,将其添加到切片中,直到我遇到空白字符。但这似乎效率极低。

如何:

def trunc(ipt, length, suffix='...'):
if " " in ipt[length-1: length]:
# The given length puts us on a word boundary
return ipt[:length].rstrip(' ') + suffix
# Otherwise add the "tail" of the input, up to just before the first space it contains
return ipt[:length] + ipt[length:].partition(" ")[0] + suffix
s = "it's always sunny in philadelphia"  # Best to avoid 'str' as a variable name, it's a builtin
for n in (1, 4, 5, 6, 7, 12, 13):
print(f"{n}: {trunc(s, n)}")

输出:

1: it's...
4: it's...
5: it's...
6: it's always...
7: it's always...
12: it's always...
13: it's always sunny...

注意5和12的行为:这段代码假设您想要消除出现在&;…&;之前的空格。

不知何故,我错过了Markus Jarderot在链接帖子中提供的答案

def smart_truncate2(text, min_length=100, suffix='...'):
"""If the `text` is more than `min_length` characters long,
it will be cut at the next word-boundary and `suffix`will
be appended.
"""
pattern = r'^(.{%d,}?S)s.*' % (min_length-1)
return re.sub(pattern, r'1' + suffix, text)

运行

3.49 µs ± 25.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

@slothrop的解决方案运行:

897 ns ± 3.27 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

快得多

相关内容

最新更新