如何使用Python在XML文件中调整时间表数据



使用Python从XML文件更改数据有没有简单的方法?就我而言,我有一个大型XML文件,其中包含许多时间表,看起来像;

<TimeSeries xmlns="http://www.test.nl/test/PI" xmlns:xsi="http://www.etc.org/XMLSchema-instance" xsi:schemaLocation="http://www.test.nl/test/PI http://something.xsd" version="1.2">
    <timeZone>0.0</timeZone>
    <series>
        <header>
            <locationId>locationA</locationId>
            <parameterId>id</parameterId>
        </header>
        <event date="2017-12-08" time="09:35:00" value="0.039" flag="0"/>
        <event date="2017-12-08" time="09:40:00" value="0.039" flag="0"/>
        <event date="2017-12-08" time="09:45:00" value="0.039" flag="0"/>
        <event date="2017-12-08" time="09:50:00" value="0.039" flag="0"/>
        <event date="2017-12-08" time="09:55:00" value="0.039" flag="0"/>
        <event date="2017-12-08" time="10:00:00" value="0.039" flag="0"/>
        <event date="2017-12-08" time="10:05:00" value="0.038" flag="0"/>
        <event date="2017-12-08" time="10:10:00" value="0.038" flag="0"/>
        <event date="2017-12-08" time="10:15:00" value="0.038" flag="0"/>
....
    </series>
    <series>
        <header>
          etc

和许多位置。我想将分配的数字替换为.csv文件中给出的数字,将分配的数字替换为"值",这也可以具有不同的日期&amp;时代。这可能吗?XML文件中的其他信息应保持相同,但包含更多信息,这些信息会改变很多,以使编写全新文件并不是很有吸引力。

我想出了如何使用以下内容来读取XML File LimeSeries_import.xml的数据

from lxml import etree as ET
tree_org = ET.parse('timeseries_import.xml')
root_org = tree_org.getroot()
values = []
for element in root_org.iter():
     if element.tag == '{http://www.test.nl/test/PI}event':
        for i in element.iter():
            values.append(i.attrib['value'])

但这没有替换它的结构,也没有找到属于哪个位置的值。任何建议都将不胜感激!

考虑XSLT,XSLT是旨在转换XML文件的特殊用途语言。Python的lxml可以运行XSLT 1.0脚本。具体来说,我们可以将您的CSV转换为XML数据,然后运行XSLT的document()函数将XML与XML匹配,并且需要返回需要新的 value

下面使用示例CSV,其中lxml将生成XML版本的文件,其文件( key.xml (将其硬编码为XSLT脚本,并且@Values使用上述帖子以 input进行更新。XML 。确保两个XML都在同一目录中。

csv (与输入匹配的数据更多(

date,time,value
2017-12-08,09:35:00,0.518
2017-12-08,09:40:00,0.829
2017-12-08,09:45:00,0.311
2017-12-08,09:50:00,0.861
2017-12-08,09:55:00,0.515
2017-12-08,10:00:00,0.560
2017-12-08,10:05:00,0.545
2017-12-08,10:10:00,0.710
2017-12-08,10:15:00,0.847
2017-12-08,10:20:00,0.121
2017-12-08,10:25:00,0.085
2017-12-08,10:30:00,0.163
2017-12-08,10:35:00,0.124
2017-12-09,10:40:00,0.930
2017-12-10,10:45:00,0.804
2017-12-11,10:50:00,0.073
2017-12-12,10:55:00,0.139
2017-12-13,11:00:00,0.010
2017-12-14,11:05:00,0.913

XSLT (另存为.xsl文件,一个special .xml文件; python中生成的key.xml(

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                              xmlns:PI="http://www.test.nl/test/PI" >
  <xsl:output indent="yes"/>
  <xsl:strip-space elements="*"/>
  <!-- Identity Transform -->
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>
  <xsl:template match="PI:event">
    <xsl:copy>
        <xsl:variable name="date_value" select="@date"/>
        <xsl:variable name="time_value" select="@time"/>
        <xsl:apply-templates select="@date|@time"/>
        <xsl:attribute name="value">
           <xsl:value-of select="document('key.xml')/data/row[date = $date_value and
                                                              time = $time_value]/value"/>
        </xsl:attribute>
        <xsl:apply-templates select="@flag"/>
    </xsl:copy>
  </xsl:template>
</xsl:stylesheet>

python

import csv
import lxml.etree as et
# CREATE XML FROM CSV
root = et.Element('data')
with open('data.csv', 'r') as f:
    r = csv.reader(f)
    next(r)
    for row in r:        
        dataNode = et.SubElement(root, "row")
        et.SubElement(dataNode, "date").text = row[0]
        et.SubElement(dataNode, "time").text = row[1]
        et.SubElement(dataNode, "value").text = row[2]    
tree_out = et.tostring(root, pretty_print=True, xml_declaration=True, encoding="UTF-8")
with open('key.xml', 'wb') as f:
    f.write(tree_out)
# LOAD XML AND XSL FILES
xml = et.parse('input.xml')
xsl = et.parse('xslt_script.xsl')
# TRANSFORM SOURCE XML    
transform = et.XSLT(xsl)        
result = transform(xml)
# OUTPUT TO SCREEN
print(result)
# OUTPUT TO FILE
with open('output.xml', 'wb') as f:
    f.write(result)

output (其中@value通过相应的日期和时间更改(

<?xml version="1.0"?>
<TimeSeries xmlns="http://www.test.nl/test/PI" xmlns:xsi="http://www.etc.org/XMLSchema-instance" xsi:schemaLocation="http://www.test.nl/test/PI http://something.xsd" version="1.2">
  <timeZone>0.0</timeZone>
  <series>
    <header>
      <locationId>locationA</locationId>
      <parameterId>id</parameterId>
    </header>
    <event date="2017-12-08" time="09:35:00" value="0.518" flag="0"/>
    <event date="2017-12-08" time="09:40:00" value="0.829" flag="0"/>
    <event date="2017-12-08" time="09:45:00" value="0.311" flag="0"/>
    <event date="2017-12-08" time="09:50:00" value="0.861" flag="0"/>
    <event date="2017-12-08" time="09:55:00" value="0.515" flag="0"/>
    <event date="2017-12-08" time="10:00:00" value="0.560" flag="0"/>
    <event date="2017-12-08" time="10:05:00" value="0.545" flag="0"/>
    <event date="2017-12-08" time="10:10:00" value="0.710" flag="0"/>
    <event date="2017-12-08" time="10:15:00" value="0.847" flag="0"/>
  </series>
</TimeSeries>