用于删除段落和文件末尾之间的多个空白行的 Python 脚本

我写了一个python脚本来捕获我想要的数据，但是我有一个包含多个段落的结果文本文件，但每个段落都由不同的空白行分隔 - 从 2 到 8 不等。

我的文件在文件末尾还有多个空白行。

我希望 Python 在段落之间留下不超过 2 个空白行，在文本文件末尾没有空白行。

我已经尝试过循环和line.strip，替换等，但我显然不知道如何将它们放在一起。

到目前为止我一直在使用的示例

wf = open(FILE,"w+")
for line in wf:
         newline = line.strip('^rn')
         wf.write(newline)
         wf.write('n')

实际上，删除所有空白行然后在段落之间插入两个空白行（末尾没有空白行）比计算所有空白行并在两个以上时才删除更容易。除非您正在处理大文件，否则我认为这两种方法之间不会有任何性能差异。这是一个使用re的快速而肮脏的解决方案：

import re
# Reads from file
f = open('test.txt', 'r+')
txt = f.read()
# Removes all blank lines
txt = re.sub(r'ns*n', 'n', txt)
# Adds two blanks between all paragraphs
txt = re.sub(r'n', 'nnn', txt)
# Removes the blank lines from the EOF
txt = re.sub(r'n*Z', '', txt)
# Writes to file and closes
f.write(txt)
f.close()

以前：

One line below
None below
Three below

EOF with one blank line below (stackoverflow's code thingy omits it)

后：

One line below

None below

Three below

EOF with one blank line below

下面是一些未经测试的代码：

import re
new_lines = re.compile('n{2,9}')
with open(FILE) as f:
    contents = f.read()
contents = re.sub(new_lines, 'nnn', contents.strip())
with open(FILE, 'w') as f:
    f.write(contents)

首先删除末尾的空白行。然后，正则表达式匹配文件内容中 2 到 9 个换行符的实例，并通过 re.sub() 函数将这些实例替换为 3 个换行符。

我知道要求的答案是python，但我相信这可能是矫枉过正。

为什么不直接在 shell 上预处理文件？使用grep或sed或awk来实现此目的。

这是 grep 版本：

$ grep -v '^$' input.txt > output.txt

这是我找到的快速参考

到目前为止，这个问题还没有真正得到解答。这是一个有效的解决方案，但我认为它可以更好。

newtext = ''    
counter = 0
for line in text.splitlines():
    line = line.strip()
    if len(line)==0:
        counter += 1
        if counter<=2:
            newtext += line + 'n'
    else:
        newtext += line + 'n'
        counter = 0

相关内容

最新更新

热门标签：