从多个UTF8文本文件中的汉字中删除空格的Python代码



我正试图在Python 3.7.2中编写一段Python代码,以删除同一目录中多个UTF8文本文件中所有汉字的空格。

我目前拥有的代码仅适用于1个文件:

import re
with open("transcript 0623.txt") as text:
new_text = re.sub("(?<![ -~]) (?![ -~])", "", text)
with open("transcript 0623_out.txt", "w") as result:
result.write(new_text)

我得到以下错误:

Traceback (most recent call last):
File "C:UsersAdminDesktopWave.3test.py", line 4, in <module>
new_text = re.sub("(?<![ -~]) (?![ -~])", "", text)
File "C:UsersAdminAppDataLocalProgramsPythonPython37-32Libre.py", line 192, in sub
return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or bytes-like object

你能告诉我哪里出了问题,并帮助我对代码提出改进建议吗?非常感谢。

open()返回一个文件对象(来源:https://docs.python.org/3/library/functions.html#open)

如果要对文件的内容执行regex操作,则必须对文件对象使用.read()函数来获取文本内容。

例如,

with open("transcript 0623.txt") as f:
text = f.read()
new_text = re.sub("(?<![ -~]) (?![ -~])", "", text)
with open("transcript 0623_out.txt", "w") as result:
result.write(new_text)

最新更新