用于将多个文件中的特定行合并到单个文件(并删除部分复制行)的代码

首先，我对这方面还很陌生。在过去的几天里，我一直在阅读一些教程，但现在我已经遇到了我想要实现的目标。

给你一个长版本：我在一个目录中有多个文件，所有这些文件都包含特定行中的信息(23-26(。现在，代码必须找到并打开所有文件(命名模式：*.tag(，然后将第23-26行复制到一个新的单个文件中。(并在每个新条目后添加一行…(。可选地，它还将从每行中删除我不需要的特定部分：

C12b2

->C12b2(或类似物(之前的所有内容都需要被移除。

到目前为止，我已经设法将这些行从一个文件复制到一个新文件中，但其余的仍然无法完成：(不知道这里的格式是如何工作的(

f = open('2.tag')     
n = open('output.txt', 'w')
for i, text in enumerate(f):
    if i >= 23 and i < 27:
        n.write(text)
    else:
        pass

有人能给我一些建议吗？我不需要一个完整的代码作为答案，但是，不跳过解释的好教程似乎很难找到。

您可以查看glob模块，它提供了一个与您提供的模式匹配的文件名列表，请注意，此模式不是regex，而是shell样式的模式(使用shell样式的通配符(。

glob-示例

>>> import glob
>>> glob.glob('*.py')
['a.py', 'b.py', 'getpip.py']

然后可以对glob.glob()函数返回的每个文件进行迭代。

对于每个文件，你都可以做与现在相同的事情。

然后，在编写文件时，可以使用str.find()查找字符串C12b2的第一个实例，然后使用切片删除不需要的部分。

举个例子-

>>> s = "asdbcdasdC12b2jhfasdas"
>>> s[s.find("C12b2"):]
'C12b2jhfasdas'

您可以对每一行执行类似的操作，请注意，如果用例中只有一些行具有C12b2，那么在执行上述切片之前，您需要首先检查该字符串是否存在于行中。示例-

if 'C12b2' in text:
    text = text[text.find("C12b2"):]

在将该行写入输出文件之前，可以执行上述操作。

此外，最好研究一下with语句，您可以使用它来打开文件，这样当您完成处理时，它将自动处理关闭文件。

除os:外不导入任何内容

#!/usr/bin/env python3
import os
# set the directory, the outfile and the tag below
dr = "/path/to/directory"; out = "/path/to/newfile"; tag = ".txt"
for f in [f for f in os.listdir(dr) if f.endswith(".txt")]:
    open(out, "+a").write(("").join([l for l in open(dr+"/"+f).readlines()[22:25]])+"n")

它的作用

正如你所描述的那样：

从目录中的所有文件(即：定义的扩展名(收集定义的行区域
将节粘贴到一个新文件中，用新行分隔

解释

[f for f in os.listdir(dr) if f.endswith(".tag")]

列出目录中特定扩展名的所有文件，

[l for l in open(dr+"/"+f).readlines()[22:25]]

读取文件的选定行

open(out, "+a").write()

写入输出文件，如果不存在则创建该文件。

如何使用

将脚本复制到一个空文件中，另存为collect_lines.py
在head部分设置包含文件的目录、新文件的路径和扩展名
使用以下命令运行：
```
python3 /path/to/collect_lines.py
```

详细的版本，带说明

如果我们"解压缩"上面的代码，就会发生以下情况：

#!/usr/bin/env python3
import os
#--- set the path to the directory, the new file and the tag below
dr = "/path/to/directory"; out = "/path/to/newfile"; tag = ".txt"
#---
files = os.listdir(dr)
for f in files:
    if f.endswith(tag):
        # read the file as a list of lines
        content = open(dr+"/"+f).readlines()
        # the first item in a list = index 0, so line 23 is index 22
        needed_lines = content[22:25]
        # convert list to string, add a new line
        string_topaste = ("").join(needed_lines)+"n"
        # add the lines to the new file, create the file if necessary
        open(out, "+a").write(string_topaste)

使用glob包，您可以获得所有*.tag文件的列表：

import glob
# ['1.tag', '2.tag', 'foo.tag', 'bar.tag']
tag_files = glob.glob('*.tag')

如果您使用with语句打开文件，它将在之后自动关闭：

with open('file.tag') as in_file:
    # do something

使用readlines()将整个文件读取到一个行列表中，然后可以对其进行切片：

lines = in_file.readlines()[22:26]

如果您需要跳过特定模式之前的所有内容，请使用str.split()在该模式处分离字符串，并获取最后一部分：

pattern = 'C12b2'
clean_lines = [line.split(pattern, 1)[-1] for line in lines]

看看这个例子：

>>> lines = ['line 22', 'line 23', 'Foobar: C12b2 line 24']
>>> pattern = 'C12b2'
>>> [line.split(pattern, 1)[-1] for line in lines]
['line 22', 'line 23', ' line 24']

您可以使用a和b作为要写入的线片的线边界的realines和writelines：

with open('oldfile.txt', 'r') as old:
    lines = old.readlines()[a:b]
with open('newfile.txt', 'w') as new:
    new.writelines(lines)

它的作用

解释

如何使用

详细的版本，带说明

相关内容

最新更新

热门标签：