我有一个文件(XML(,我需要计算模式(标签(之间的字符数,并且模式在文件中重复。
模式为:
<controlfield tag="001">
示例 XML 文件内容:
<datafield tag="650" ind1="0" ind2="4">
<subfield code="a">xxx</subfield>
<subfield code="x">sdf</subfield>
</datafield>
<datafield tag="650" ind1="0" ind2="4">
<subfield code="a">fff</subfield>
</datafield>
<datafield tag="650" ind1="0" ind2="4">
<subfield code="a">asdfaf</subfield>
<subfield code="x">fdfdf</subfield>
<subfield code="x">dfdfdf</subfield>
</datafield>
<controlfield tag="001">000000355</controlfield>
<datafield tag="909" ind1=" " ind2=" ">
<subfield code="a">AGR01</subfield>
<subfield code="b">ph</subfield>
<subfield code="c">AGRP</subfield>
</datafield>
<datafield tag="910" ind1=" " ind2=" ">
<subfield code="a">AGR</subfield>
</datafield>
<controlfield tag="001">000000358</controlfield>
<datafield tag="590" ind1=" " ind2=" ">
<subfield code="a">19. dfsdfs em 2015</subfield>
<subfield code="w">CECLI</subfield>
</datafield>
<datafield tag="650" ind1="0" ind2="4">
<subfield code="a">Topografia</subfield>
</datafield>
<controlfield tag="001">000000365</controlfield>
我阅读了 https://unix.stackexchange.com/questions/295332/i-need-the-counts-of-lines-between-two-matching-patterns 并尝试:
sed -n '/tag="001"/,/tag="001"/p' file.xml | wc -l
但只打印了一个计数器。
每次出现模式都需要一个计数器,在上面的例子中,我需要 3 个计数器:
之前的字符数
<controlfield tag="001">000000355</controlfield>
字符数介于
<controlfield tag="001">000000355</controlfield>
和
<controlfield tag="001">000000358</controlfield>
字符数介于
<controlfield tag="001">000000358</controlfield>
和
<controlfield tag="001">000000365</controlfield>
你可以帮我吗?
与 GNUawk
$ awk -v RS="<controlfield tag="001">[0-9]+</controlfield>" '{print length()}' file
394
253
239
1
最后 1 用于最后一个换行符。 您可能希望在计算长度之前删除换行符。