基于父节点提取XML子节点值



我有一个相当大的xml,看起来像这样:

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE tv SYSTEM "xmltv.dtd">
<tv source-info-url="http://blah blah blah.com/" source-info-name="blah.com" generator-info-name="zap2xml" generator-info-url="zap2xml@gmail.com">
<channel id="IX.XXXXXX.blah.com">
<display-name>WCBS</display-name>
<display-name>2 WCBS</display-name>
<display-name>2</display-name>
<icon src="https://blah blah blah.png" />
</channel>
<channel id="IX.XXXXX.blah.com">
<display-name>WCBSDT</display-name>
<display-name>2 WCBSDT</display-name>
<display-name>2</display-name>
<icon src="https://blah blah blah.png" />
</channel>
<channel id="IX.XXXXX.blah.com">
<display-name>WNBC</display-name>
<display-name>4 WNBC</display-name>
<display-name>4</display-name>
<icon src="https://blah blah blah.png" />
</channel>
.....
</tv>

现在,我只想循环遍历.xml,提取通道id TAG值,然后提取第一个和第三个"显示名称"标记值,并根据该id显示每个通道。

我有这个:

#!/bin/bash
file='/path/to/xml/file/file.xml'
cat $file | while read line ; do
if [[ $line == *"<channel id="* ]]; then
channelid=$(echo $line|awk -F'"' '{print $2}')
channelnum=$(xmlstarlet sel -t -v "//channel[@id='$channelid']//display-name[3]" -n $file)
callsign=$(xmlstarlet sel -t -v "//channel[@id='$channelid']//display-name[1]" -n $file)
clear
echo "Here are the details for channel id: $channelid"
echo ""
echo "Channel Number is: $channelnum"
echo "Channel Call Sign is: $callsign"
echo ""
sleep 2
fi
done

它确实做了我想做的事,但整个过程变慢了,因为它一直试图为它遇到的每个通道查找外部实体链接-输出这个:

Here are the details for channel id: IX.XXXXX.blah.com
Channel Number is: 2
Channel Call Sign is: WCBS
/path/to/the/epg.xml:2.30: failed to load external entity "/path/to/the/script/xmltv.dtd"
<!DOCTYPE tv SYSTEM "xmltv.dtd">
^
/path/to/the/xmlfile.xml:2.30: failed to load external entity "/path/to/the/script/xmltv.dtd"
<!DOCTYPE tv SYSTEM "xmltv.dtd">

如何抑制这些查找?我只想解析这些值。

如何抑制这些查找?

您可以首先使用xmlstarletfo(format(命令来删除DOCTYPE声明。

您还应该能够通过使用xmlstarletsel(select(命令和XPath来获得所需的所有值。尝试用regex对文件进行cat、读取行和解析可能很脆弱。

完整示例。。。

#!/bin/bash
file='/path/to/the/file.xml'
xmlstarlet fo -D $file | 
xmlstarlet sel -T -t -m "/tv/channel" 
-v "concat('Here are the details for channel id: ', @id)" -nl -nl 
-v "concat('  Channel Number is: ', display-name[3])" -nl 
-v "concat('  Channel Call Sign is: ', display-name[1])" -nl -nl

输出。。。

Here are the details for channel id: IX.XXXXXX.blah.com
Channel Number is: 2
Channel Call Sign is: WCBS
Here are the details for channel id: IX.XXXXX.blah.com
Channel Number is: 2
Channel Call Sign is: WCBSDT
Here are the details for channel id: IX.XXXXX.blah.com
Channel Number is: 4
Channel Call Sign is: WNBC

最新更新