使用 Grep,将来自多个 XML 文件的数据聚合到单个文件中



我有多个xml文件,我想从中获取值,并将它们逐行写入单独的csv/text/excel文件中。

我尝试了下面的 grep 命令:

grep -e <r p>  Inputfilename | sed 's/<[^>]*>//g' | awk '{ print $2 }' | awk '{ for (i=1;i<=NF;i++ ) printf $i " " }' >> Output.txt 

但是此命令将所有值写入一行。我是新手,所以我不确定如何按行分隔值。

下面是一个示例输入文件:

<measType p="1">Used NonHeap Mem MB</measType>
            <measType p="2">Online CPU Usage %</measType>
            <measType p="3">Used Physical Mem %</measType>
            <measType p="4">Used Physical Mem MB</measType>
            <measType p="5">Used Heap Mem %</measType>
            <measType p="6">Used Tenured Gen MB</measType>
            <measType p="7">Used Survivor Space MB</measType>
            <measType p="8">Used NonHeap Mem %</measType>
            <measType p="9">Total CPU Usage %</measType>
            <measType p="10">Used Eden Space MB</measType>
            <measType p="11">Used Heap Mem MB</measType>
            <measValue measObjLdn="">
                <r p="1">48.361183166503906</r>
                <r p="2">0.008397036232054234</r>
                <r p="3">4.5677</r>
                <r p="4">34425.0</r>
                <r p="5">68.05066879841843</r>
                <r p="6">410.58392333984375</r>
                <r p="7">22.375</r>
                <r p="8">93.67783664213832</r>
                <r p="9">0.028054807427357</r>
                <r p="10">169.9580841064453</r>
                <r p="11">602.8837356567383</r>
            </measValue>

对于此输入,我从上述命令获得的输出是:

48.361183166503906 0.008397036232054234 4.5677 34425.0 68.05066879841843 410.58392333984375 22.375 93.67783664213832 0.028054807427357 169.9580841064453 602.883735656738

当我为多个文件运行此命令时,它会产生如下所示的内容:

48.361183166503906 0.008397036232054234 4.5677 34425.0 68.05066879841843 410.58392333984375 22.375 93.67783664213832 0.028054807427357 169.9580841064453 602.8837356567383  48.377540588378906 0.008116667158901691 5.73992 33834.0 10.798112742450364 42.10478973388672 22.375 93.70952172083081 0.021666161122907 31.18431854248047 95.66410827636719  58.068382263183594 3.406280755996704 6.46515 34405.0 56.60833858273274 903.4959945678711 16.5166015625 94.90236120642875 7.068469741716277 39.66230773925781 959.4206771850586

但我希望命令结果是:

48.361183166503906 0.008397036232054234 4.5677 34425.0 68.05066879841843 410.58392333984375 22.375 93.67783664213832 0.028054807427357 169.9580841064453 602.8837356567383  
48.377540588378906 0.008116667158901691 5.73992 33834.0 10.798112742450364 42.10478973388672 22.375 93.70952172083081 0.021666161122907 31.18431854248047 95.66410827636719  
58.068382263183594 3.406280755996704 6.46515 34405.0 56.60833858273274 903.4959945678711 16.5166015625 94.90236120642875 7.068469741716277 39.66230773925781 959.4206771850586  

请协助我。提前感谢!

一种选择是将 xmlstarlet 的 tr 命令与 XSLT 样式表一起使用。

例。。。

XSLT 1.0 (example.xsl(

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="text"/>
  <xsl:strip-space elements="*"/>
  <xsl:template match="/*">
    <xsl:for-each select=".//r">
      <xsl:sort select="@p" data-type="number"/>
      <xsl:if test="position() > 1">
        <xsl:text> </xsl:text>
      </xsl:if>
      <xsl:value-of select="normalize-space()"/>
    </xsl:for-each>
    <xsl:text>&#xA;</xsl:text>
  </xsl:template>
</xsl:stylesheet>

XMLSTARLET 命令行

xml tr example.xsl *.xml

输出(使用两个输入文件;您提供的文件和每个r值添加"b"的副本(

48.361183166503906 0.008397036232054234 4.5677 34425.0 68.05066879841843 410.58392333984375 22.375 93.67783664213832 0.028054807427357 169.9580841064453 602.8837356567383
48.361183166503906b 0.008397036232054234b 4.5677b 34425.0b 68.05066879841843b 410.58392333984375b 22.375b 93.67783664213832b 0.028054807427357b 169.9580841064453b 602.8837356567383b

你也可以用xmlstarlet的sel命令得到一些非常相似的东西(目前我在输出的开头得到了一个额外的换行符(:

xml sel -T -t -n -m "//r" -s A:N:T "@p" -v "normalize-space()" -o " " *.xml

它可能将所有内容写在一行中,因为您的 awk 命令中的 printf。 默认情况下,printf 不添加换行符。尝试使用打印或显式添加""。

或者,如果 measValue 选项卡始终包含 11 个节点,请考虑使用:

$ grep -e <r p>  Inputfilename | sed 's/<[^>]*>//g' | awk '{print $2}' | paste - - - - - - - - - - -

最新更新