使用Python中的ElementTree创建多级继承XML树



因此,如果输入是一个txt文件,我们必须创建一个XML树或任何层次结构,以便于解析。然后找到最后一位CEO的最后一位员工。

txt提供了一个公司的结构,其中的列按以下顺序排列:姓名、工资、雇主。那些以"NOBODY"为雇主的是公司中的首席执行官。有雇主名字的人在上述雇主名下工作txt看起来像这样:

Vineel Phatak, 520, NOBODY
Ajay Joshi, 250, Vineel Phatak
Abhishek Chauhan, 120, Ajay Joshi
Jayesh Godse, 500, NOBODY
Vijaya Mundada, 60, Abhishek Chauhan
Shital Tuteja, 45, Jayesh Godse
Rajan Gawli, 700, Vineel Phatak
Zeba Khan, 300, Jayesh Godse
Chaitali Sood, 100, Zeba Khan
Sheila Rodrigues, 35, Vineel Phatak

鉴于此,我们必须完成这样的事情:

Company
->Vineel Phatak
-->Ajay Joshi
--->Abhishek Chauhan
---->Vijaya Mundada
-->Rajan Gawli
-->Sheila Rodrigues
->Jayesh Godse
-->Shital Tuteja
-->Zeba Khan
--->Chaitali Sood

XML格式:

<company>
<Vineel Phatak>
<Ajay Joshi>
<Abhishek Chauhan>
<Vijaya Mundada />
</Abhishek Chauhan>
</Ajay Joshi>
<Rajan Gawli />
<Sheila Rodrigues />
</Vineel Phatak>
<Jayesh Godse>
<Shital Tuteja />
<Zeba Khan>
<Chaitali Sood />
</Zeba Khan>
</Jayesh Godse>
</company>

我尝试做的是,在创建了一个名为company的元素后,由于我们需要将子元素添加到根(company(中,我尝试生成这些子元素并将其附加到列表中。然后解析列表并进行比较以获得值。

# Find last employee of the last introduced CEO
import xml.etree.ElementTree as ET
# Reading Input
inD = open('input.txt', 'r')
data = inD.readlines()
inD.close()
# Creating an element and saving all subelement to list
all_element = []
company = ET.Element('Company')
ceos = []
for i in data:
t = i.strip().split(',')
if(t[2].strip() == 'NOBODY'):
ceos.append(t[0])
all_element.append(ET.SubElement(company, t[0]))
# company.clear()
# Creating a function to add subelements
def findChilds(name, emp):
global all_element
for i in all_element:
if emp == i.tag:
name = ET.SubElement(i, name)
# If it is CEO hence no emplyer then directly add subelement to company or else add to the previous subelement
for j in data:
t = j.strip().split(',')
if t[2].strip() == 'NOBODY':
e = ET.SubElement(company, t[0])
elif t[2].strip() != 'NOBODY':
findChilds(t[0].strip(), t[2].strip())

ET.dump(company)

结果如下:

<Company><Vineel Phatak><Ajay Joshi /><Rajan Gawli /><Sheila Rodrigues /></Vineel Phatak><Ajay Joshi><Abhishek Chauhan /></Ajay Joshi><Abhishek Chauhan><Vijaya Mundada /></Abhishek Chauhan><Jayesh Godse><Shital Tuteja /><Zeba Khan /></Jayesh Godse><Vijaya Mundada /><Shital Tuteja /><Rajan Gawli /><Zeba Khan><Chaitali Sood /></Zeba Khan><Chaitali Sood /><Sheila Rodrigues /><Vineel Phatak /><Jayesh Godse /></Company>

你所看到的并不完全正确。删除元素(第18行(也不起作用,因为它拒绝添加ceos以外的子元素

因此,最后,我们需要创建此层次结构,然后打印出最后一位首席执行官的最后一位员工的姓名,在本例中为:
最后一位CEO:Jayesh Godse
首席执行官最后一位雇员(直接或间接,最后一位从输入中介绍(:Chaitali Sood

Output:
Chaitali Sood

此外,首席执行官的人数及其子女和孙子女的人数也不确定,姓名也不确定

我是ElementTree的新手,所以可能有一些预定义的函数我可能不知道,所以请原谅我的无知。我们非常感谢您的见解和建议。提前感谢!

在列出我的例子之前,有一点关于xml结构:在创建xml结构时,最好使用元素标记的"对象的类",并将其"属性"(如名称和薪资(存储为xml属性:
<employee name="Vineel Phatak" salary="520"/>
而不是:
<Vineel Phatak/>
这将使解析更加容易,并为扩展格式提供更大的灵活性。

我的例子

您问题的示例实现:

import csv
from dataclasses import dataclass
import xml.etree.ElementTree as ET

@dataclass
class Employee:
linenumber: int
name: str
salary: str
manager_name: str
subordinates: list

employees = {}  # a dictionary to map names to employees
# load employees
with open('company.csv') as csvfile:
reader = csv.reader(csvfile)
for linenumber, row in enumerate(reader):
(name, salary, manager_name) = [value.strip() for value in row]
employees[name] = Employee(linenumber, name, salary, manager_name, [])

# link employees to their subordinates
ceos = []
for employee in employees.values():
if employee.manager_name == 'NOBODY':
# store the ceos in a list to start building the xml from later
ceos.append(employee)
else:
# look up the manager by it name
manager = employees[employee.manager_name]
manager.subordinates.append(employee)
# create xml
companyelement = ET.Element('company')
def add_employees_to_xml_element(xmlelement, employees):
for employee in employees:
employee_element = ET.Element("employee", {
"name": employee.name,
"salary": employee.salary
})
xmlelement.append(employee_element)
add_employees_to_xml_element(employee_element, employee.subordinates)

add_employees_to_xml_element(companyelement, ceos)
ET.dump(companyelement)
# find the last entered ceo
def linenumber_key(ceo): return ceo.linenumber

last_entered_ceo = max(ceos, key=linenumber_key)
print(f"Last entered CEO: {last_entered_ceo.name}")
# find the last entered (in)direct subordinate of the last entered ceo
def find_last_entered_subordinate(employee, current_last=None):
for subordinate in employee.subordinates:
if not current_last:
current_last = subordinate  # ensuring an initial value
else:
current_last = max([current_last, subordinate], key=linenumber_key)
# recursive: travers the subordinate's subordinates
current_last = find_last_entered_subordinate(subordinate, current_last)
return current_last

last_employee = find_last_entered_subordinate(last_entered_ceo)
print(f"Last added subordinate of last CEO: {last_employee.name}")

我把练习分解为以下几个部分:

  1. 将员工从CSV文件加载到字典中,以方便(并加快(以后按姓名查找员工。我还存储了每个员工的行号,供您以后提问时使用
  2. 将员工与其下属链接。假设管理人员可能在下属之后列出,那么在第一步中就无法将其合并。每个员工都会有一个下属列表,CEO存储在一个单独的"根"列表中
  3. 使用元素树和遍历上述创建的CEO列表的递归函数创建xml
  4. 查找最后一位输入的CEO。我们已经有了一个CEO列表,但因为它是从字典中创建的(这不能确保元素的检索顺序与添加的顺序相同(,所以我不能只取最后一个元素,而是应该找到行号最高的CEO
  5. 查找最后一位输入的ceo的直接下属。与上面类似,这次我使用了一个递归函数来根据行号检索这个员工

生成的xml:

<company>
<employee name="Vineel Phatak" salary="520">
<employee name="Ajay Joshi" salary="250">
<employee name="Abhishek Chauhan" salary="120">
<employee name="Vijaya Mundada" salary="60"/>
</employee>
</employee>
<employee name="Rajan Gawli" salary="700"/>
<employee name="Sheila Rodrigues" salary="35"/>
</employee>
<employee name="Jayesh Godse" salary="500">
<employee name="Shital Tuteja" salary="45"/>
<employee name="Zeba Khan" salary="300">
<employee name="Chaitali Sood" salary="100"/>
</employee>
</employee>
</company>

最新更新