删除非数字字符perl



我有一个文件,它有多个引号,如下所示:

  <verse-no>quote</verse-no>
            <quote-verse>1:26,27 Man Created to Continually Develop</quote-verse>
            <quote>When Adam came from the Creator’s hand, he bore, in his physical, mental, and
                spiritual nature, a likeness to his Maker. “God created man in His own image”
                (Genesis 1:27), and it was His purpose that the longer man lived the more fully
                he should reveal this image—the more fully reflect the glory of the Creator. All
                his faculties were capable of development; their capacity and vigor were
                continually to increase. Ed 15
            </quote>

我想从<quote-verse>.....</quote-verse>行中删除所有字符串,这样最终结果将是<quote>1:26,27</quote>

我试过perl -pi.bak -e 's#D*$</quote-verse>#</quote-verse>#g' file.txt

这毫无作用。我是perl的初学者(自学),只有不到10天的经验。请告诉我出了什么问题以及如何处理。

您拥有XML。因此,您需要一个XML解析器。XML::Twig是一个好的。很多人之所以说"不要使用正则表达式来解析XML",是因为尽管它的作用范围有限。但XML是一种规范,某些东西是有效的,有些则不然。如果你的代码建立在不总是正确的假设之上,那么你最终会得到脆弱的代码——如果有人将他们完全有效的XML更改为稍微不同但仍然完全有效的XML,那么代码总有一天会毫无征兆地崩溃。

因此,考虑到这一点:

这项工作:

#!/usr/bin/perl
use strict;
use warnings;
use XML::Twig;
sub quote_verse_handler {
    my ( $twig, $quote ) = @_;
    my $text = $quote->text;
    $text =~ s/(d)D+$/$1/;
    $quote->set_text($text);
}
my $parser = XML::Twig->new(
    twig_handlers => { 'quote-verse' => &quote_verse_handler },
    pretty_print  => 'indented'
);

#$parser -> parsefile ( 'your_file.xml' );
local $/;
$parser->parse(<DATA>);
$parser->print;

__DATA__
<xml>
<verse-no>quote</verse-no>
        <quote-verse>1:26,27 Man Created to Continually Develop</quote-verse>
        <quote>When Adam came from the Creator's hand, he bore, in his physical, mental, and
            spiritual nature, a likeness to his Maker. "God created man in His own image"
            (Genesis 1:27), and it was His purpose that the longer man lived the more fully
            he should reveal this image-the more fully reflect the glory of the Creator. All
            his faculties were capable of development; their capacity and vigor were
            continually to increase. Ed 15
        </quote>
   </xml>

它所做的是——运行您的文件。每次遇到quote-verse节时,它都会调用处理程序,并为它提供XML的"那个部分"。我们应用一个正则表达式,去掉行的尾部,然后相应地更新XML。

一旦解析完成,我们就吐出成品。

您可能需要更换:

local $/;
$parser -> parse ( <DATA> );

带有:

$parser -> parsefile ( 'your_file_name' );

您还可以找到:

$parser -> print_to_file( 'output_filename' ); 

有用。

最新更新