读取XML(NIST.n42文件)和数据提取

我有一个xml文件，需要从下面xml中的"channelData"中提取数据。

from xml.dom import minidom
xmldoc = minidom.parse('Annex_B_n42.xml')
itemlist = xmldoc.getElementsByTagName('ChannelData')
print(len(itemlist))
print(itemlist[0].attributes['compressionCode'].value)
for s in itemlist:
print(s.attributes['compressionCode'].value)

它不返回数据，只返回值"None"。

我还尝试了另一个例子中的另一种方法：

import xml.etree.ElementTree as ET
root = ET.parse('Annex_B_n42.xml').getroot()
#value=[]
for type_tag in root.findall('Spectrum'):
value = type_tag.get('id')
print(value)
print("data from file " +str(value))

这根本不起作用，并且没有填充value。我真的不知道如何解析xml。

这是xml文件

<?xml version="1.0"?>
<?xml-model href="http://physics.nist.gov/N42/2011/N42/schematron/n42.sch" type="application/xml" schematypens="http://purl.oclc.org/dsdl/schematron"?>
<RadInstrumentData xmlns="http://physics.nist.gov/N42/2011/N42" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://physics.nist.gov/N42/2011/N42 file:///d:/Data%20Files/ANSI%20N42%2042/V2/Schema/n42.xsd" n42DocUUID="d72b7fa7-4a20-43d4-b1b2-7e3b8c6620c1">
<RadInstrumentInformation id="RadInstrumentInformation-1">
<RadInstrumentManufacturerName>RIIDs R Us</RadInstrumentManufacturerName>
<RadInstrumentModelName>iRIID</RadInstrumentModelName>
<RadInstrumentClassCode>Radionuclide Identifier</RadInstrumentClassCode>
<RadInstrumentVersion>
<RadInstrumentComponentName>Software</RadInstrumentComponentName>
<RadInstrumentComponentVersion>1.1</RadInstrumentComponentVersion>
</RadInstrumentVersion>
</RadInstrumentInformation>
<RadDetectorInformation id="RadDetectorInformation-1">
<RadDetectorCategoryCode>Gamma</RadDetectorCategoryCode>
<RadDetectorKindCode>NaI</RadDetectorKindCode>
</RadDetectorInformation>
<EnergyCalibration id="EnergyCalibration-1">
<CoefficientValues>-21.8 12.1 6.55e-03</CoefficientValues>
</EnergyCalibration> 
<RadMeasurement id="RadMeasurement-1">
<MeasurementClassCode>Foreground</MeasurementClassCode>
<StartDateTime>2003-11-22T23:45:19-07:00</StartDateTime>
<RealTimeDuration>PT60S</RealTimeDuration>
<Spectrum id="RadMeasurement-1Spectrum-1" radDetectorInformationReference="RadDetectorInformation-1" energyCalibrationReference="EnergyCalibration-1"> 
<LiveTimeDuration>PT59.61S</LiveTimeDuration>
<ChannelData compressionCode="None">
0 0 0 22 421 847 1295 1982 2127 2222 2302 2276
2234 1921 1939 1715 1586 1469 1296 1178 1127 1047 928 760
679 641 542 529 443 423 397 393 322 272 294 227
216 224 208 191 189 163 167 173 150 137 136 129
150 142 160 159 140 103 90 82 83 85 67 76
73 84 63 74 70 69 76 61 49 61 63 65
58 62 48 75 56 61 46 56 43 37 55 47
50 40 38 54 43 41 45 51 32 35 29 33
40 44 33 35 20 26 27 17 19 20 16 19
18 19 18 20 17 45 55 70 62 59 32 30
21 23 10 9 5 13 11 11 6 7 7 9
11 4 8 8 14 14 11 9 13 5 5 6
10 9 3 4 3 7 5 5 4 5 3 6
5 0 5 6 3 1 4 4 3 10 11 4
1 4 2 11 9 6 3 5 5 1 4 2
6 6 2 3 0 2 2 2 2 0 1 3
1 1 2 3 2 4 5 2 6 4 1 0
3 1 2 1 1 0 1 0 0 2 0 1
0 0 0 1 0 0 0 0 0 0 0 2
0 0 0 1 0 1 0 0 2 1 0 0
0 0 1 3 0 0 0 1 0 1 0 0
0 0 0 0 
</ChannelData> 
</Spectrum>
</RadMeasurement> 
</RadInstrumentData>

您可以使用BeautifulSoup来获取channeldata标签值，如以下

from bs4 import BeautifulSoup
with open('Annex_B_n42.xml') as f:
xml = f.read()
bs_obj = BeautifulSoup(xml)
print(bs_obj.find_all("channeldata")[0].text)

这会打印你的

'         0 0 0 22 421 847 1295 1982 2127 2222 2302 2276         2234 1921 1939 1715 1586 1469 1296 1178 1127 1047 928 760         679 641 542 529 443 423 397 393 322 272 294 227         216 224 208 191 189 163 167 173 150 137 136 129         150 142 160 159 140 103 90 82 83 85 67 76         73 84 63 74 70 69 76 61 49 61 63 65         58 62 48 75 56 61 46 56 43 37 55 47         50 40 38 54 43 41 45 51 32 35 29 33         40 44 33 35 20 26 27 17 19 20 16 19
18 19 18 20 17 45 55 70 62 59 32 30         21 23 10 9 5 13 11 11 6 7 7 9         11 4 8 8 14 14 11 9 13 5 5 6         10 9 3 4 3 7 5 5 4 5 3 6         5 0 5 6 3 1 4 4 3 10 11 4         1 4 2 11 9 6 3 5 5 1 4 2         6 6 2 3 0 2 2 2 2 0 1 3         1 1 2 3 2 4 5 2 6 4 1 0         3 1 2 1 1 0 1 0 0 2 0 1         0 0 0 1 0 0 0 0 0 0 0 2         0 0 0 1 0 1 0 0 2 1 0 0         0 0 1 3 0 0 0 1 0 1 0 0         0 0 0 0       '

试试这个：

import xml.etree.ElementTree as ET
root = ET.parse('Annex_B_n42.xml').getroot()
elems = root.findall(".//*[@compressionCode='None']")
print(elems[0].text)

相关内容

最新更新

热门标签：