根据特定的标准，用不间断的空间替换空间

我想清理包含错误格式的文件，更确切地说，替换"正常的"；根据给定的准则，具有不间断空间的空间。

例如：

如果在一句话中，我有：

"您需要步行5公里">

我需要将5和km之间的空间替换为非中断空间。

到目前为止，我已经做到了：

import os
unites = ['km', 'm', 'cm', 'mm', 'mi', 'yd', 'ft', 'in']
# iterate and read all files in the directory
for file in os.listdir():
# check if the file is a file
if os.path.isfile(file):
# open the file
with open(file, 'r', encoding='utf-8') as f:
# read the file
content = f.read()
# search for exemple in the file
for i in unites:
if i in content:
# find the next character after the unit
next_char = content[content.find(i) + len(i)]
# check if the next character is a space
if next_char == ' ':
# replace the space with a non-breaking space
content = content.replace(i + ' ', i + 'u00A0')

但这会替换文档中的所有空格，而不是我想要的空格。你能帮我吗？

编辑

在UlfR的回答非常有用和相关之后，我想进一步推动我的标准；"搜索/替换"；更复杂。

现在我想搜索单词前后的字符，以便用不间断的空格替换空格。例如：

我想搜索短语"；搜索可以是假设的吗"我想要假设和之间的空间？以由不间断的空间代替
否则也是"；在搜索中，有必要参考"；{图1.12}"；我希望{，figure和}之间的空间是不间断的空间，但figure和1.12之间的空间也是不间断的(所以在这种情况下所有的空间(

我试过这样做：

units = ['km', 'm', 'cm', 'mm', 'mi', 'yd', 'ft', 'in']
units_before_after = ['{']
nbsp = 'u00A0'
rgx = re.sub(r'(bd+)(%s) (%s)b'%(units, units_before_after),r'1%s2'%nbsp,text))
print(rgx)

但是我遇到了一些麻烦，你有什么想法可以分享吗？

您应该使用re来进行替换。像这样：

import re
text = "You need to walk 5 km or 500000 cm."
units = ['km', 'm', 'cm', 'mm', 'mi', 'yd', 'ft', 'in']
nbsp = 'u00A0'
print(re.sub(r'(bd+) (%s)b'%'|'.join(units),r'1%s2'%nbsp,text))

搜索和替换模式都是动态构建的，但基本上你有一个匹配的模式：

在事物的开头b
1个或多个数字d+
一个空格
其中一个单元km|m|cm|...
b

然后我们用两个组替换所有这些，并在它们之间使用nbsp字符串。

有关如何在python中使用正则表达式的更多信息，请参阅re。它非常值得花时间学习基础知识，因为它是一个非常强大和有用的工具！

玩得开心：(

相关内容

最新更新

热门标签：