替换两个字符之间的文本行



我有一个bibtex文件,它是合并了其他几个.bib文件。在合并过程中,除了一个重复条目外,所有重复条目都被注释掉了,因此所有重复条目的情况都如下所示。其中一些有20~30个条目被注释掉,使得一个100个引用的文件有30k行文本长。

@Article{goodnight2005,
  author    = {Goodnight, N. and Wang, R. and Humphreys, G.},
  journal   = {{IEEE Computer Graphics and Applications}},
  title     = {{Computation on programmable graphics hardware}},
  year      = {2005},
  volume    = {25},
  number    = {5},
  pages     = {12-15}
}
###Article{goodnight2005,
  author    = {Goodnight, N. and Wang, R. and Humphreys, G.},
  journal   = {{IEEE Computer Graphics and Applications}},
  title     = {{Computation on programmable graphics hardware}},
  year      = {2005},
  volume    = {25},
  number    = {5},
  pages     = {12-15}
}
@INPROCEEDINGS{Llosa-pact96,
    author = {Josep Llosa and Antonio González and Eduard Ayguadé and Mateo Valero},
    title = {Swing Modulo Scheduling: A Lifetime-Sensitive Approach},
    booktitle = {In IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques (PACT'96},
    year = {1996},
    pages = {80--86}
    }

如何删除所有以###开头的行,直到下一行带@排他符?实际上,我的结果文件应该是:

@Article{goodnight2005,
      author    = {Goodnight, N. and Wang, R. and Humphreys, G.},
      journal   = {{IEEE Computer Graphics and Applications}},
      title     = {{Computation on programmable graphics hardware}},
      year      = {2005},
      volume    = {25},
      number    = {5},
      pages     = {12-15}
    }
@INPROCEEDINGS{Llosa-pact96,
        author = {Josep Llosa and Antonio González and Eduard Ayguadé and Mateo Valero},
        title = {Swing Modulo Scheduling: A Lifetime-Sensitive Approach},
        booktitle = {In IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques (PACT'96},
        year = {1996},
        pages = {80--86}
        }

例如sed '/###/,/@/{//!d}的参考书目。Bib保持以###开头的行,但sed '/###/,/@/d' bibliography。Bib使以@开头的行消失。

非常感谢你的帮助

使用$skip哨兵值的简单解决方案:

use strict;
use warnings; 
my $skip = 0;
while ( <> ) {
   $skip = 1 if /^###/;
   $skip = 0 if /^@/;
   next if $skip;
   print;
}
输出:

[hmcmillen]$ perl test.pl < test.txt 
@Article{goodnight2005,
  author    = {Goodnight, N. and Wang, R. and Humphreys, G.},
  journal   = {{IEEE Computer Graphics and Applications}},
  title     = {{Computation on programmable graphics hardware}},
  year      = {2005},
  volume    = {25},
  number    = {5},
  pages     = {12-15}
}
@INPROCEEDINGS{Llosa-pact96,
    author = {Josep Llosa and Antonio González and Eduard Ayguadé and Mateo Valero},
    title = {Swing Modulo Scheduling: A Lifetime-Sensitive Approach},
    booktitle = {In IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques (PACT'96},
    year = {1996},
    pages = {80--86}
}

如果你真的希望它是一个单一的命令:

perl -ne 'BEGIN { $SKIP = 1 } $SKIP = 1 if /^###/; $SKIP = 0 if /^@/; print unless $SKIP;' < test.txt

假设您的输入文件都是当前目录或更低目录下的*.bib文件

让我做你今天的find perl魔术师:

find . -name '*.bib' -exec 
perl -i -ne '$o=1if/^@/;$o=0if/^###/;print if$o' {} ;

如果你不能阅读,不要使用它。例如,它将删除第一个@行之前的任何内容,并且不会考虑缩进@###行。

还有一个很好的模块叫做File::Find,阅读perldoc File::Find。我个人认为,这不会让它保持一行。

With awk:

$ awk '/###/{p=0} /@/{p=1} p' bib.text
@Article{goodnight2005,
  author    = {Goodnight, N. and Wang, R. and Humphreys, G.},
  journal   = {{IEEE Computer Graphics and Applications}},
  title     = {{Computation on programmable graphics hardware}},
  year      = {2005},
  volume    = {25},
  number    = {5},
  pages     = {12-15}
}
@INPROCEEDINGS{Llosa-pact96,
    author = {Josep Llosa and Antonio González and Eduard Ayguadé and Mateo Valero},
    title = {Swing Modulo Scheduling: A Lifetime-Sensitive Approach},
    booktitle = {In IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques (PACT'96},
    year = {1996},
    pages = {80--86}
    }

最新更新