在 Python 中使用正则表达式拆分特定"space"字符下方的句子



我一直在尝试解决将句子拆分为特定长度下有意义的单词集的问题。

string1 = "Alice is in wonderland"
string2 = "Bob is playing games on his computer"

我想要一个正则表达式,它与与低于 20 个字符的条件匹配的代表词。

new_string1 = "Alice is in"
new_string2 = "Bob is playing games"

这可以用正则表达式做到这一点吗?

这不是正则表达式的良好用例。虽然,textwrap.shorten方法正是实现了这一点。

import textwrap
string1 = "Alice is in wonderland"
string2 = "Bob is playing games on his computer"
new_string1 = textwrap.shorten(string1, 20, placeholder="")
new_string2 = textwrap.shorten(string2, 20, placeholder="")
print(new_string1) # Alice is in
print(new_string2) # Bob is playing games

textwrap.shorten的唯一缺点是它会折叠空间。如果您不希望发生这种情况,则可以实现自己的方法。

def shorten(s, max_chars):
# Special case is the string is shorter than the number of required chars
if len(s) <= max_chars:
return s.rstrip()
stop = 0
for i in range(max_chars + 1):
# Always keep the location of the last space behind the pointer
if s[i].isspace():
stop = i
# Get rid of possible extra space added on the tail of the string
return s[:stop].rstrip()
string1 = "Alice is in wonderland"
string2 = "Bob is playing games on his computer"
new_string1 = shorten(string1, 20)
new_string2 = shorten(string2, 20)
print(new_string1) # Alice is in
print(new_string2) # Bob is playing games

最新更新