如何用信息填充空白,同时保持其余部分不变



我正在为飞行模拟器构建风景,需要弄清楚如何编辑文本文件中的许多行(其中 3,579,189 行)。

我有TextCrawler Pro,Node,Python SVN和Notepad++作为工具。

原始的预编辑部分:

POLYGON_POINT -79.750000000217,42.017498354525,0
POLYGON_POINT -79.750000000217,42.016478251402,0
POLYGON_POINT -79.750598748133,42.017193264943,0
POLYGON_POINT -79.750000000217,42.017498354525,0

POLYGON_POINT -79.750000000217,42.085882815878,0
POLYGON_POINT -79.750000000217,42.082008734634,0
POLYGON_POINT -79.751045507507,42.082126409633,0
POLYGON_POINT -79.750281907508,42.083166574215,0
POLYGON_POINT -79.750781149174,42.084212672130,0
POLYGON_POINT -79.750000000217,42.085882815878,0

POLYGON_POINT -79.750000000217,42.088955814831,0
POLYGON_POINT -79.750456566883,42.087544672125,0
POLYGON_POINT -79.751642899173,42.088273325249,0
POLYGON_POINT -79.751461052298,42.088916154415,0
POLYGON_POINT -79.750000000217,42.088955814831,0

使用Notepad++的替换功能,添加POLYGON_POINT行很容易。现在我需要一些帮助来使其如下所示:

BEGIN_POLYGON
POLYGON_POINT -79.750000000217,42.017498354525,0
POLYGON_POINT -79.750000000217,42.016478251402,0
POLYGON_POINT -79.750598748133,42.017193264943,0
POLYGON_POINT -79.750000000217,42.017498354525,0
END_POLY
BEGIN_POLYGON
POLYGON_POINT -79.750000000217,42.085882815878,0
POLYGON_POINT -79.750000000217,42.082008734634,0
POLYGON_POINT -79.751045507507,42.082126409633,0
POLYGON_POINT -79.750281907508,42.083166574215,0
POLYGON_POINT -79.750781149174,42.084212672130,0
POLYGON_POINT -79.750000000217,42.085882815878,0
END_POLY
BEGIN_POLYGON
POLYGON_POINT -79.750000000217,42.088955814831,0
POLYGON_POINT -79.750456566883,42.087544672125,0
POLYGON_POINT -79.751642899173,42.088273325249,0
POLYGON_POINT -79.751461052298,42.088916154415,0
POLYGON_POINT -79.750000000217,42.088955814831,0

即在每个块之前添加BEGIN_POLYGON,在每个块之后添加END_POLY

我该怎么做?

我会使用itertools.groupby(仅采用具有if k条件的非空白组)对行进行分组,并为每个组添加页眉/页脚。然后使用itertools.chain展平组

import itertools
with open("file.txt") as f, open("fileout.txt","w") as fw:
    fw.writelines(itertools.chain.from_iterable([["BEGIN_POLYGONn"]+list(v)+["END_POLYGONn"] for k,v in itertools.groupby(f,key = lambda l : bool(l.strip())) if k]))

key = lambda l : bool(l.strip()))是分组键:测试空行但测试线路终止

此方法不需要完全读取文件,因此适用于非常大的文件。它逐行处理文件,因此不会占用内存。

使用sed的快速解决方案

cat -s file.txt |
    sed -e 's/^$/END_POLYnBEGIN_POLYGON/'
    -e '1i BEGIN_POLYGON'
    -e '$a END_POLY'
  • cat -s将所有空白行压缩为一个
  • 第一个 SED 将空白行替换为 END_POLY 和 BEGIN_POLYGON 标记
  • 第二个和最后一个 SED 将剩余的标记前置并附加到输出

最新更新