在单个XML文档中应用特定于节点的排序标准



我是Python编程的新手,正在尝试用每个节点不同的排序顺序重新排列XML文档中的标记。

问题是XML文件没有统一的结构,比如有些节点是1级深度,有些节点的数据高达4级。

到目前为止,我被几个StackOverflow帖子有类似问题,检查其他博客文章,也称作XMLTree还有文档。但是,我想我遗漏了一些东西。我正在尝试使用xml.etree模块的ElementTree来解决,并且还没有检查lxml模块。

XML片段:

<?xml version='1.0' encoding='UTF-8' ?>
<?org schema-ver='8.6' ?>
<org>
<setup>
<app-host>example.com</app-host>
<app-port>80</app-port>
<alternateNames>
<hostname>b.example.com</hostname>
<hostname>a.example.com</hostname>
<hostname>c.example.com</hostname>
</alternateNames>
<departments>
<departmentNames>
<departmentName>
<department>
<name>physics</name>
</department>
<available>false</available>
</departmentName>
<departmentName>
<department>
<name>math</name>
</department>
<available>true</available>
</departmentName>
<departmentName>
<department>
<name>biology</name>
</department>
<available>false</available>
</departmentName>
<departmentName>
<department>
<name>chemistry</name>
</department>
<available>false</available>
</departmentName>
</departmentNames>
</departments>
</setup>
<affiliation>
<sisterCollege>
<name>C</name>
</sisterCollege>
<sisterCollege>
<name>A</name>
</sisterCollege>
<sisterCollege>
<name>B</name>
</sisterCollege>
</affiliation>
<courseMaterials>
<courses/>
</courseMaterials>
</org>

预期输出:

<?xml version='1.0' encoding='UTF-8' ?>
<?org schema-ver='8.6' ?>
<org>
<setup>
<app-host>example.com</app-host>
<app-port>80</app-port>
<alternateNames>
<!-- ASC order -->
<hostname>a.example.com</hostname>
<hostname>b.example.com< /hostname>
<hostname>c.example.com</hostname>
</alternateNames>
<departments>
<!-- DESC order using departmentNames/departmentName/department/name -->
<departmentNames>
<departmentName>
<department>
<name>physics</name>
</department>
<available>false</available>
</departmentName>
<departmentName>
<department>
<name>math</name>
</department>
<available>true</available>
</departmentName>
<departmentName>
<department>
<name>chemistry</name>
</department>
<available>false</available>
</departmentName>
<departmentName>
<department>
<name>biology</name>
</department>
<available>false</available>
</departmentName>
</departmentNames>
</departments>
</setup>
<affiliation>
<!-- No sort required -->
<sisterCollege>
<name>C</name>
</sisterCollege>
<sisterCollege>
<name>A</name>
</sisterCollege>
<sisterCollege>
<name>B</name>
</sisterCollege>
</affiliation>
<courseMaterials>
<courses/>
</courseMaterials>
</org>
到目前为止,我尝试了以下代码:
import xml.etree.ElementTree as ET
# Reference: https://stackoverflow.com/questions/25338817/sorting-xml-in-python-etree
# for org/setup/alternateNames node
def sortchildrenby(parent):
parent[:] = sorted(parent, key=lambda child: child.text)
# print(ET.dump(parent))
def sortchildrenby2(parent2):
# approach-1: Didn't work
# parent2[:] = sorted(parent, key=lambda child:
#   child.find('departmentNames').find('departmentName').find('department').find('name').text)

# approach-2: sort nested elements (didn't work)
# for c1 in parent2:
#   for c2 in c1:
#     for c3 in c2:
#       for c4 in c3:
#         c4[:] = sorted(c4, key=lambda child: child.text)
#print(ET.dump(parent2))
# approach-3: didn't work either    
# data = []
# for elem in parent2.findall('department'):
#   key = elem.findtext('name')
#   # key = key.strip()
#   data.append((key, elem))
# data.sort()
# parent2[:] = [item[1] for item in data]
print(ET.dump(parent2))

tree = ET.parse('temp.xml')
root = tree.getroot()
setup = root.find('setup')
alternateNames = setup.find('alternateNames')
# seems working
sort_alternateNames = sortchildrenby(alternateNames)
departments  = setup.find('departments')
sort_departments = sortchildrenby2(departments)
print('1-asc: ' + ET.dump(alternateNames)) # seems working for a single node
print('2-desc' + ET.dump(sort_departments)) # prints same as input
ET.tostring(tree)

请建议。

下面是如何获得所需排序的方法。alternateNameshostname子节点按内容排序。departmentNamesdepartmentName子序列按departmentName/department/name的内容排序。

import xml.etree.ElementTree as ET

tree = ET.parse('temp.xml')

# Sort the children of alternateNames
alt_names = tree.find(".//alternateNames")
alt_names[:] = sorted(alt_names, key=lambda child: child.text)

# Sort the children of departmentNames (descending)
dep_names = tree.find(".//departmentNames")
dep_names[:] = sorted(dep_names, 
key=lambda child: child.findtext("department/name"), 
reverse=True)