删除重复空格并有效删除所有的简单方法



我有一个名为test.txt的文件 它有一堆重复的空格。test.txt文件包含 HTML。我想删除所有非空白以减小test.txt文件中内容的大小。如何删除重复的空格并将整个字符串放在一行上。

测试.txt

<center>
<b class="test" >My       name
is

fred</      b> <center>

我要打印的内容

<center><b class="test">My name is fred</b><center>

打印的内容

<center><b class="test" >Mynameisfred</b> <center>

program.py

def is_white_space(before, curr, after):
# remove duplicate spaces
if (curr == " " and (before == " " or after == " ")):
return True
# Remove all n
elif (curr == "n"):
return True
return False

f = open('test.txt', 'r')
contents = f.read()
f.close()
new = "";
i = 0
while (i < len(contents)):
if (i != 0 and
i != (len(contents) - 1) and
not is_white_space(contents[i - 1], contents[i], contents[i + 1])):
new += contents[i]
i += 1
print(new)

这将在数字或字母之间留下一个空格。

from string import ascii_letters, digits

def main():
with open('test.txt', 'r') as f:
parts = f.read().split()
keep_separated = set(ascii_letters) | set(digits)
for i in range(len(parts) - 1):
if parts[i][-1] in keep_separated and parts[i + 1][0] in keep_separated:
parts[i] = parts[i] + " "
print(''.join(parts))

if __name__ == '__main__':
main()

最新更新