我试图打开一个XML文件并对其进行解析,查看其标记并在每个特定标记中找到文本。如果标记中的文本与字符串匹配,我希望它删除字符串的一部分或用其他内容替换它。
然而,它看起来只是透过第一个";过滤器";元素。它不是在做剩下的";过滤器";元素。在第二过滤元件中;regex";是";DFW.+$"所以它应该将其引入if语句:";如果操作=="删除":如果重新匹配(r"\b(?<=\w(%s\b(?!\w(&"%regex,mfn_pn,re.IGNORECASE(:"并删除";DFW";和"如果";DFW";是在前三个字符和"中"是最后一个字符,其中mfn_pn变量是用户输入的条形码,类似于ATL-157-1815、DFW-184-8378、ATL-324-3243、DFW-432-2343。然而,这并没有实现。
XML文件包含以下数据:
<?xml version="1.0" encoding="utf-8"?>
<metadata>
<filter>
<regex>ATL|LAX|DFW</regex >
<start_char>3</start_char>
<end_char></end_char>
<action>remove</action>
</filter>
<filter>
<regex>DFW.+.$</regex >
<start_char>3</start_char>
<end_char>-1</end_char>
<action>remove</action>
</filter>
<filter>
<regex>-</regex >
<replacement></replacement>
<action>substitute</action>
</filter>
<filter>
<regex>s</regex >
<replacement></replacement>
<action>substitute</action>
</filter>
<filter>
<regex> T&R$</regex >
<start_char></start_char>
<end_char>-4</end_char>
<action>remove</action>
</filter>
</metadata>
我使用的Python代码是:
from xml.etree.ElementTree import ElementTree
# filters.xml is the file that holds the things to be filtered
tree = ElementTree()
tree.parse("filters.xml")
# Get the data in the XML file
root = tree.getroot()
# Loop through filters
for x in root.findall('filter'):
# Find the text inside the regex tag
regex = x.find('regex').text
# Find the text inside the start_char tag
start_prim = x.find('start_char')
# If the element exists assign its text start variable
start = start_prim.text if start_prim is not None else ''
start_int = int(start) if start is not None else ''
# Find the text inside the end_char tag
end_prim = x.find('end_char')
# If the element exists assign its text end variable
end = end_prim.text if end_prim is not None else ''
end_int = int(end) if end is not None else ''
# Find the text inside the action tag
action = x.find('action').text
if action == 'remove':
if re.match(r"b(?<=w)%sb(?!w)" % regex, mfn_pn, re.IGNORECASE):
if end_int == '':
mfn_pn = mfn_pn[start_int:]
print(mfn_pn)
else:
mfn_pn = mfn_pn[start_int:end_int]
print(mfn_pn)
elif action == 'substitute':
mfn_pn = re.sub(r"b(?<=w)%sb(?!w)" % regex, '', mfn_pn)
return mfn_pn
试试这个。
from simplified_scrapy import SimplifiedDoc,utils
import re
def test(f='filters.xml', pn='ATL-157-1815'):
doc = SimplifiedDoc()
doc.loadFile(f)
# Get the data in the XML file
root = doc.select('metadata')
# Loop through filters
for x in root.selects('filter'):
# Find the text inside the regex tag
regex = x.regex.text
# Find the text inside the start_char tag
start = x.start_char.text
# Find the text inside the end_char tag
end = x.end_char.text
# Find the text inside the action tag
action = x.action.text
if action == 'remove':
if re.match(r"b(?<=w)%sb(?!w)" % regex, pn, re.IGNORECASE):
if end == '':
pn = pn[int(start):]
print(pn)
else:
pn = pn[int(start):int(end)]
print(pn)
elif action == 'substitute':
pn = re.sub(r"b(?<=w)%sb(?!w)" % regex, '', pn)
return pn
pn = test()
print (pn)