使用嵌套迭代器遍历两级结构

我有以下两个级别的XML结构。框的列表，每个框包含一个抽屉列表。

<Boxes>
    <Box id="0">
        <Drawers>
            <Drawer id="0"/>
            <Drawer id="1"/>
            ...
        </Drawers>
    </Box>
    <Box id="1">
...
    </Box>
</Boxes>

我正在使用StAX解析它，并通过两个Iterators公开结构：

BoxIterator implements Iterator<Box>, Iterable<Box>
Box implements Iterable<Drawer>
DrawerIterator implements Iterator<Drawer>

然后，我可以执行以下操作：

BoxIterator boxList;
for (Box box : boxList) {
  for (Drawer drawer : box) {
    drawer.getId()
  }
}

在我正在使用StAX的那些Iterators的引擎盖下，它们都在访问相同的底层XMLStreamReader。如果我调用BoxIterator.next()它将影响后续调用DrawerIterator.next()时返回的结果，因为光标将移动到下一个框。

这是否违反了Iterator的合同？有没有更好的方法来使用 StAX 迭代两级结构？

这是否违反了Iterator的合同？

不。

爪哇Iterator强加了两个"契约"。第一个契约是 Java 接口本身，它声明了 3 个方法：hasNext()、next() 和 remove()。实现此Iterator接口的任何类都必须定义这些方法。

第二个合约定义了Iterator的行为：

hasNext() 如果迭代具有更多元素，则返回 true。[...] next() 返回迭代中的下一个元素 [并] 抛出NoSuchElementException如果迭代没有更多元素。

这就是整个合同。

的确，如果底层XMLStreamReader是先进的，它可能会弄乱您的BoxIterator和/或DrawerIterator。或者，在错误的点调用BoxIterator.next()和/或DrawerIterator.next()可能会弄乱迭代。但是，如果使用得当，例如在上面的示例代码中，它可以正常工作并大大简化代码。您只需要记录迭代器的正确用法。

作为一个具体的例子，Scanner类实现了Iterator<String>，但还有许多其他方法来推进底层流。如果存在Iterator类强加的更强的契约，那么Scanner类本身就会违反它。

正如伊万在评论中指出的那样，boxList不应该是 class BoxIterator implements Iterator<Box>, Iterable<Box> 型 . 你真的应该有：

class BoxList implements Iterable<Box> { ... }
class BoxIterator implements Iterator<Box> { ... }
BoxList boxList = ...;
for (Box box : boxList) {
  for (Drawer drawer : box) {
    drawer.getId()
  }
}

虽然让一个类同时实现Iterable和Iterator对于您的用例来说在技术上并没有错，但它可能会导致混淆。

在另一个上下文中考虑以下代码：

List<Box> boxList = Arrays.asList(box1, box2, box3, box4);
for(Box box : boxList) {
    // Do something
}
for(Box box : boxList) {
    // Do some more stuff
}

在这里，boxList.iterator()被调用两次，以创建两个单独的Iterator<Box>实例，用于迭代两次框列表。由于boxList可以多次迭代，因此每次迭代都需要一个新的迭代器实例。

在您的代码中：

BoxIterator boxList = new BoxIterator(xml_stream);
for (Box box : boxList) {
  for (Drawer drawer : box) {
    drawer.getId();
  }
}

由于您正在迭代流，因此无法(不倒带流或存储提取的对象(再次迭代相同的节点。不需要第二个类/对象;同一个对象可以同时充当可迭代和迭代器......为您节省一个类/对象。

话虽如此，过早优化是万恶之源。一个类/对象的节省不值得可能的混淆;您应该BoxIterator拆分为BoxList implements Iterable<Box>，并BoxIterator implements Iterator<Box> 。

它有可能破坏合同，因为hasNext()可以返回true，但next()可能会抛出NoSuchElementException。

hasNext()的合同是：

如果迭代具有更多元素，则返回 true。(换句话说，如果 next(( 将返回一个元素而不是抛出异常，则返回 true。

但是在调用hasNext()和next()之间，另一个迭代器可能会移动流的位置，以便没有更多的元素。

但是，以您使用它的方式(嵌套循环(，您不会遇到损坏。

如果要将迭代器传递给另一个进程，则可能会遇到此中断。

你的代码段唯一的设计问题是BoxIterator同时实现了Iterator和Iterable。通常，每次调用Iterable方法时对象都会返回新的有状态Iterator iterator()。因此，两个迭代器之间应该没有干扰，但是您需要一个状态对象来正确实现内部循环的退出(可能，您已经有了，但为了清楚起见，我必须提到它(。

状态对象将充当解析器的代理，具有两个方法 popEvent 和 peekEvent。在速览时，迭代器将检查最后一个事件，但不会使用它。在流行音乐上，他们将消耗最后一个事件。
BoxIterable#iterator()将使用 StartElement(Boxes( 并在此之后返回迭代器。
BoxIterator#hasNext()将查看事件并弹出它们，直到收到 StartElement 或 EndElement。然后，仅当收到 StartElement(Box( 时，它才会返回 true。
BoxIterator#next()将查看并弹出属性事件，直到收到 StartElement 或 EndElement 以初始化 Box 对象。
Box#iterator()将使用 StartElement(Drawers( 事件，然后返回 DrawerIterator。
DrawerIterator#hasNext()将偷看并弹出，直到收到StartElement或EndElement。然后，仅当它是 StartElement(Drawer( 时，它才会返回 true。
DrawerIterator#next()将使用属性事件，直到收到 EndElement(抽屉(。

您的用户代码将保持几乎不被修改：

BoxIterable boxList;
/*
 * boxList must be an BoxIterable, which on call to iterator() returns 
 * new BoxIterator initialized with current state of STaX parser
 */
for (Box box : boxList) { 
  /* 
   * on following line new iterator is created and initialized 
   * with current state of parser 
   */
  for (Drawer drawer : box) { 
    drawer.getId()
  }
}

只要你通过实现Iterator接口在BoxIterator和DrawerIterator中仔细实现/覆盖next()和hasNext()方法，它看起来不会破坏合约。不用说，需要注意的明显条件是，如果next()返回元素，hasNext()应该返回true，如果next()给出异常，false。

但我无法理解的是，你为什么要BoxIterator实施Iterable<Box>

BoxIterator implements Iterator<Box>, Iterable<Box>由于从Box接口覆盖iterator()方法Iterable总是会返回BoxIterator的实例。如果您背后没有任何其他目标，那么就没有将此功能封装在 BoxIterator 中的目的。

相关内容

最新更新

热门标签：