如何使用正则表达式在句点后添加缺少的空格，而不更改小数

我有一大块文本在某些句点后面缺少空格。但是，文本中也包含十进制数字。

以下是到目前为止我使用regex(我使用的是python(解决问题的方法：

re.sub(r"(?!d.d)(?!. ).", '. ', my_string)

但第一个逃生小组似乎不起作用。它仍然匹配十进制数字中的句点。

以下是确保任何潜在解决方案有效的示例文本：

this is a.match
this should also match.1234
and this should 123.match
this should NOT match. Has space after period
this also should NOT match 1.23

您可以使用

re.sub(r'.(?!(?<=d.)d) ?', '. ', text)

请参阅regex演示。尾部空间是可选匹配的，因此如果它在那里，它将被移除并放回原处。

详细信息

.-一个点
(?!(?<=d.)d)-如果前面的点是两位数之间的点，则不再匹配
?-可选空间

查看Python演示：

import re
text = "this is a.matchnthis should also match.1234nand this should 123.matchnnthis should NOT match. Has space after periodnthis also should NOT match 1.23"
print(re.sub(r'.(?!(?<=d.)d) ?', '. ', text))

输出：

this is a. match
this should also match. 1234
and this should 123. match
this should NOT match. Has space after period
this also should NOT match 1.23

或者，在尝试时使用(?! )前瞻：

re.sub(r'.(?!(?<=d.)d)(?! )', '. ', text)

请参阅regex演示和Python演示。

另一种方式。。不确定这比Wiktor的解决方案的性能更好还是更差。

re.sub(r"(?!d.d)(?!.. )(..)(.)", r"1 2", my_string)

txt="hello world.this is boise idaho.a this is twin falls."
pattern=r"(w+s*.w+)+"
matches=re.findall(pattern,txt)
for item in matches:
front,back=item.split('.')
replace=front+'. '+back
txt=re.sub(item,replace,txt)

print(txt)
hello world. this is boise idaho. a this is twin falls.

相关内容

最新更新

热门标签：