Python列表搜索、比较和消除元素

我想得到所有没有对的元素。这是一个从上到下读取的XML标记列表，去掉了括号。我想找到对（例如，打开标记note和关闭标记/note），将它们从列表中删除，然后留下没有对的标记。

你如何遍历列表，将每个标签与所有其他标签进行比较，然后说：啊哈，我发现了另一个以正斜杠开头的"note"标签？

谢谢。

还有其他更好的方法可以找到不匹配的标签吗？

PS：我确实希望保留列表的顺序，如果可能的话，在将标签与列表中的另一个标签进行比较时使用相等。如果使用"in"运算符，它将不起作用，因为如果标记名是像"a"这样的一个字母，则搜索将返回所有包含"a"的元素，而不是与"a"完全匹配的元素。

tags = ['note', 'to', 'bbb', 'bbb', 'firstname', '/firstname', 'lastname', '/lastname', 'from', 'hello', 'hello', 'hello', 'hello', 'hello', 'l', '/from', '/to', 'elephant', 'll', 'from', '/from', 'a1', 'img', 'a2', 'from', 'from', '/from', '/from', '/a2', '/img', '/a1', 'heading', '/heading', 'body', '/body', '/note']

您可以创建一个包含所有结束标记的set，然后使用该集来过滤标记。

>>> closing = set([t for t in tags if t.startswith("/")])
>>> [t for t in tags if "/" + t not in closing and t not in closing]
['bbb', 'bbb', 'hello', 'hello', 'hello', 'hello', 'hello', 'l', 'elephant', 'll']

然而，请注意，这并不会真正尊重标签的"对"，而只是查看列表中是否存在相同标签的"关闭"变体。例如，给定tags = ["a", "a", "/a"]或tags = ["a", "/a", "a"]，它将从列表中删除a的两个实例。

程序的第一部分获取列表中的所有标记。如果您注意到这是查找不匹配括号的问题。它可以通过将该列表视为堆栈来解决，并查找哪些标签有错误，一路迭代。

import re
def clean_attr(attr):
    attr_list = re.split(r's+', attr)
    if len(attr_list) == 1:
        return attr
    else:
        return attr_list[0] + '>'
line="""
<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications 
      with XML.</description>
   </book>
   <book id="bk102">
      <author>Ralls, Kim</author>
      <title>Midnight Rain</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2000-12-16</publish_date>
      <description>A former architect battles corporate zombies, 
      an evil sorceress, and her own childhood to become queen 
      of the world.</description>
   </book>
   <book id="bk103">
      <author>Corets, Eva</author>
      <title>Maeve Ascendant</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2000-11-17</publish_date>
      <description>After the collapse of a nanotechnology 
      society in England, the young survivors lay the 
      foundation for a new society.</description>
   </book>
   <book id="bk104">
      <author>Corets, Eva</author>
      <title>Oberon's Legacy</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2001-03-10</publish_date>
      <description>In post-apocalypse England, the mysterious 
      agent known only as Oberon helps to create a new life 
      for the inhabitants of London. Sequel to Maeve 
      Ascendant.</description>
   </book>
   <book id="bk105">
      <author>Corets, Eva</author>
      <title>The Sundered Grail</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2001-09-10</publish_date>
      <description>The two daughters of Maeve, half-sisters, 
      battle one another for control of England. Sequel to 
      Oberon's Legacy.</description>
   </book>
   <book id="bk106">
      <author>Randall, Cynthia</author>
      <title>Lover Birds</title>
      <genre>Romance</genre>
      <price>4.95</price>
      <publish_date>2000-09-02</publish_date>
      <description>When Carla meets Paul at an ornithology 
      conference, tempers fly as feathers get ruffled.</description>
   </book>
   <book id="bk107">
      <author>Thurman, Paula</author>
      <title>Splish Splash</title>
      <genre>Romance</genre>
      <price>4.95</price>
      <publish_date>2000-11-02</publish_date>
      <description>A deep sea diver finds true love twenty 
      thousand leagues beneath the sea.</description>
   </book>
   <book id="bk108">
      <author>Knorr, Stefan</author>
      <title>Creepy Crawlies</title>
      <genre>Horror</genre>
      <price>4.95</price>
      <publish_date>2000-12-06</publish_date>
      <description>An anthology of horror stories about roaches,
      centipedes, scorpions  and other insects.</description>
   </book>
   <book id="bk109">
      <author>Kress, Peter</author>
      <title>Paradox Lost</title>
      <genre>Science Fiction</genre>
      <price>6.95</price>
      <publish_date>2000-11-02</publish_date>
      <description>After an inadvertant trip through a Heisenberg
      Uncertainty Device, James Salway discovers the problems 
      of being quantum.</description>
   </book>
   <book id="bk110">
      <author>O'Brien, Tim</author>
      <title>Microsoft .NET: The Programming Bible</title>
      <genre>Computer</genre>
      <price>36.95</price>
      <publish_date>2000-12-09</publish_date>
      <description>Microsoft's .NET initiative is explored in 
      detail in this deep programmer's reference.</description>
   </book>
      <author>O'Brien, Tim</author>
      <title>MSXML3: A Comprehensive Guide</title>
      <genre>Computer</genre>
      <price>36.95</price>
      <publish_date>2000-12-01</publish_date>
      <description>The Microsoft MSXML3 parser is covered in 
      detail, with attention to XML DOM interfaces, XSLT processing, 
      SAX and more.</description>
   </book>
   <book id="bk112">
      <author>Galos, Mike</author>
      <title>Visual Studio 7: A Comprehensive Guide</title>
      <genre>Computer</genre>
      <price>49.95</price>
      <publish_date>2001-04-16</publish_date>
      <description>Microsoft Visual Studio 7 is explored in depth,
      looking at how Visual Basic, Visual C++, C#, and ASP+ are 
      integrated into a comprehensive development 
      environment.
   </book>
</catalog>
"""
attr_open = re.findall(r'<[w+s="]+>', line)
attr_closed = re.findall(r'</w+>', line)
all_attrs = re.findall(r'<[w+s="]+>|</w+>', line)
all_attrs_cleaned = map(clean_attr, all_attrs)
# print all_attrs_cleaned
list_as_stack = []
not_closed = []
all_attrs_cleaned = iter(all_attrs_cleaned)
an_attr = all_attrs_cleaned.next()
try:
    while all_attrs_cleaned:
        if not an_attr.startswith('</'):
            list_as_stack.append(an_attr)
            an_attr = all_attrs_cleaned.next()
        else:
            temp = list_as_stack[-1]
            if re.search(r'w+', temp).group(0) == re.search(r'w+', an_attr).group(0):
                list_as_stack.pop()
                an_attr = all_attrs_cleaned.next()
            else:
                if len(list_as_stack) != 0:
                    not_closed.append(an_attr)  
                an_attr = all_attrs_cleaned.next()
except Exception:
    print "Stop Iter"
print list_as_stack
print not_closed

在上面的程序中，第一个数组告诉您哪些标记并没有关闭，第二个数组告诉您哪些关闭标记并没有打开标记。

相关内容

最新更新

热门标签：