我在多个XML文件中都有这种格式:
<bad>
<objdesc>
<desc id="butwba10.1.wc.01" dbi="BUTWBA10.1.1.WC">
<physdesc>adfa;sdfkjad</physdesc>
<related objectid="bb435.1.comdes.02"/>
<related objectid="but614r.1.penc.01"/>
<related objectid="but611.1.wc.01"/>
<related objectid="but612.1.wd.01"/>
<related objectid="bb515.1.comb.12"/>
</desc>
<desc id="butwba10.1.wc.02" dbi="BUTWBA10.1.2.WC">
<physdesc>alkdjfa;sfjsdf</physdesc>
<related objectid="but621r.1.penc.01"/>
<related objectid="bb435.1.comdes.03"/>
</desc>
</objdesc>
</bad>
我想要如下所示的输出:
butwba10.1.wc.01 dbi="BUTWBA10.1.1.WC" related="bb435.1.comdes.02, but614r.1.penc.01, but611.1.wc.01, but612.1.wd.01, bb515.1.comb.12"
butwba10.1.wc.02 dbi="BUTWBA10.1.2.WC" related="but621r.1.penc.01, bb435.1.comdes.03"
我有一个使用 xmlstarlet 迭代目录中的 xml 文件的 bash 脚本,但它在最后一个 desc id 之后转储所有"相关值"。它需要将每个 desc id 与每组"相关"值相关联。它需要包含每个 id 的 dbi 值。
#!/bin/bash
for x in *.xml
do
id=$(xml sel -t -v '//bad/objdesc/desc/@id' "$x")
arr=( $(xml sel -t -v '//bad/objdesc/desc/related/@objectid' "$x") )
cat<<EOF >> new_file
$id related="$(perl -e 'print join ",", @ARGV' "${arr[@]}")"
EOF
done
#!/bin/bash
for x in *.xml; do
count=$(xml sel -t -v 'count(//bad/objdesc/desc/@id)' "$x")
for ((i=1; i<=count; i++)); do
id=$(xml sel -t -v "//bad/objdesc/desc[$i]/@id" "$x")
arr=( $(xml sel -t -v "//bad/objdesc/desc[$i]/related/@objectid" "$x") )
cat<<EOF
$id related="$(perl -e 'print join ",", @ARGV' "${arr[@]}")"
EOF
done
done
=)
这似乎是 XSLT 的工作。但是,好吧,壳牌也可以处理这个问题...
你能为dbi
做剩下的吗?最好尝试了解这里涉及的内容,而不仅仅是剪切/粘贴。
同意 sputnick 的观点,XSLT 是正确的工具。尽管如此,使用 XML 令牌解析器的 perl 答案。优点是它只需要处理一次文件,而不是重复调用 xmlstarlet:
#!perl
use strict;
use warnings;
use XML::Parser;
my (@related, @desc); # boo, global variables
sub start {
my ($x, $elem, %attrs) = @_;
if ($elem eq "desc") {
@desc = @attrs{'id', 'dbi'};
@related = ();
}
elsif ($elem eq "related") {
push @related, $attrs{objectid};
}
}
sub end {
my ($x, $elem) = @_;
if ($elem eq "desc") {
printf qq{%s dbi="%s" related="%s"n}, @desc, join(', ', @related);
}
}
my $parser = XML::Parser->new( Handlers => {Start => &start, End => &end} );
$parser->parsefile($ARGV[0]);
在行动:
$ perl parse.pl file
butwba10.1.wc.01 dbi="BUTWBA10.1.1.WC" related="bb435.1.comdes.02, but614r.1.penc.01, but611.1.wc.01, but612.1.wd.01, bb515.1.comb.12"
butwba10.1.wc.02 dbi="BUTWBA10.1.2.WC" related="but621r.1.penc.01, bb435.1.comdes.03"
$ xml sel -t -m bad/objdesc/desc -v "concat(@id,' dbi=',@dbi,' ')" -m related -v @objectid -i "number(count(./preceding-sibling::related))+1<number(count(./../related))" -o ", " --else -n -b file.xml
butwba10.1.wc.01 dbi=BUTWBA10.1.1.WC bb435.1.comdes.02, but614r.1.penc.01, but611.1.wc.01, but612.1.wd.01, bb515.1.comb.12
butwba10.1.wc.02 dbi=BUTWBA10.1.2.WC but621r.1.penc.01, bb435.1.comdes.03