如何使用Python循环遍历整个XML文件

我试图打开一个XML文件并对其进行解析，查看其标记并在每个特定标记中找到文本。如果标记中的文本与字符串匹配，我希望它删除字符串的一部分或用其他内容替换它。

然而，它看起来只是透过第一个"；过滤器"；元素。它不是在做剩下的"；过滤器"；元素。在第二过滤元件中；regex"；是"；DFW.+$"所以它应该将其引入if语句："；如果操作=="删除"：如果重新匹配(r"\b(？<=\w(%s\b(？！\w(&"%regex，mfn_pn，re.IGNORECASE(："并删除"；DFW"；和"如果"；DFW"；是在前三个字符和"中"是最后一个字符，其中mfn_pn变量是用户输入的条形码，类似于ATL-157-1815、DFW-184-8378、ATL-324-3243、DFW-432-2343。然而，这并没有实现。

XML文件包含以下数据：

<?xml version="1.0" encoding="utf-8"?>
<metadata>
<filter>
<regex>ATL|LAX|DFW</regex >
<start_char>3</start_char>
<end_char></end_char>
<action>remove</action>
</filter>
<filter>
<regex>DFW.+.$</regex >
<start_char>3</start_char>
<end_char>-1</end_char>
<action>remove</action>
</filter>
<filter>
<regex>-</regex >
<replacement></replacement>
<action>substitute</action>
</filter>
<filter>
<regex>s</regex >
<replacement></replacement>
<action>substitute</action>
</filter>
<filter>
<regex> T&amp;R$</regex >
<start_char></start_char>
<end_char>-4</end_char>
<action>remove</action>
</filter>
</metadata>

我使用的Python代码是：

from xml.etree.ElementTree import ElementTree
# filters.xml is the file that holds the things to be filtered
tree = ElementTree()
tree.parse("filters.xml")
# Get the data in the XML file 
root = tree.getroot()
# Loop through filters
for x in root.findall('filter'):
# Find the text inside the regex tag
regex = x.find('regex').text
# Find the text inside the start_char tag
start_prim = x.find('start_char')

# If the element exists assign its text start variable
start = start_prim.text if start_prim is not None else ''
start_int = int(start) if start is not None else ''
# Find the text inside the end_char tag
end_prim = x.find('end_char')
# If the element exists assign its text end variable
end = end_prim.text if end_prim is not None else ''
end_int = int(end) if end is not None else ''
# Find the text inside the action tag
action = x.find('action').text
if action == 'remove':
if re.match(r"b(?<=w)%sb(?!w)" % regex, mfn_pn, re.IGNORECASE):
if end_int == '':
mfn_pn = mfn_pn[start_int:]
print(mfn_pn)
else: 
mfn_pn = mfn_pn[start_int:end_int]
print(mfn_pn)
elif action == 'substitute':
mfn_pn = re.sub(r"b(?<=w)%sb(?!w)" % regex, '', mfn_pn)
return mfn_pn

试试这个。

from simplified_scrapy import SimplifiedDoc,utils
import re
def test(f='filters.xml', pn='ATL-157-1815'):
doc = SimplifiedDoc()
doc.loadFile(f)
# Get the data in the XML file 
root = doc.select('metadata')
# Loop through filters
for x in root.selects('filter'):
# Find the text inside the regex tag
regex = x.regex.text
# Find the text inside the start_char tag
start = x.start_char.text
# Find the text inside the end_char tag
end = x.end_char.text
# Find the text inside the action tag
action = x.action.text
if action == 'remove':
if re.match(r"b(?<=w)%sb(?!w)" % regex, pn, re.IGNORECASE):
if end == '':
pn = pn[int(start):]
print(pn)
else: 
pn = pn[int(start):int(end)]
print(pn)
elif action == 'substitute':
pn = re.sub(r"b(?<=w)%sb(?!w)" % regex, '', pn)
return pn
pn = test()
print (pn)

相关内容

最新更新

热门标签：