从文件中的无序列表中创建嵌套数组



我正在尝试将旧的HTML站点转换为新的CMS。要获取正确的菜单hierachy(深度不同),我想读取所有文件,并提取/解析菜单(嵌套的无序列表)中的关联阵列

root.html
<ul id="menu">
  <li class="active">Start</li>
  <ul>
    <li><a href="file1.html">Sub1</a></li>
    <li><a href="file2.html">Sub2</a></li>
  </ul>
</ul>
file1.html
<ul id="menu">
  <li><a href="root.html">Start</a></li>
  <ul>
    <li class="active">Sub1</li>
    <ul>
      <li><a href="file3.html">SubSub1</a></li>
      <li><a href="file4.html">SubSub2</a></li>
      <li><a href="file5.html">SubSub3</a></li>
      <li><a href="file6.html">SubSub4</a></li>
    </ul>
  </ul>
</ul>
file3.html
<ul id="menu">
  <li><a href="root.html">Start</a></li>
  <ul>
    <li><a href="file1.html">Sub1</a></li>
    <ul>
      <li class="active">SubSub1</li>
      <ul>
        <li><a href="file7.html">SubSubSub1</a></li>
        <li><a href="file8.html">SubSubSub2</a></li>
        <li><a href="file9.html">SubSubSub3</a></li>
      </ul>
    </ul>
  </ul>
</ul>
file4.html
<ul id="menu">
  <li><a href="root.html">Start</a></li>
  <ul>
    <li><a href="file1.html">Sub1</a></li>
    <ul>
      <li><a href="file3.html">SubSub1</a></li>
      <li class="active">SubSub2</li>
      <li><a href="file5.html">SubSub3</a></li>
      <li><a href="file6.html">SubSub4</a></li>
    </ul>
  </ul>
</ul>

我想循环浏览所有文件,提取'id ="菜单"',并在保留层次结构和文件信息

的同时创建这样的数组(或类似的数组)
Array 
  [file] => root.html
  [child] => Array 
    [Sub1] => Array 
      [file] => file1.html
      [child] => Array  
        [SubSub1] => Array 
          [file] => file3.html
          [child] => Array 
            [SubSubSub1] => Array 
              [file] => file7.html
            [SubSubSub2] => Array 
              [file] => file8.html                      
            [SubSubSub3] => Array
              [file] => file9.html
        [SubSub2] => Array
          [file] => file4.html
        [SubSub3] => Array 
          [file] => file5.html
        [SubSub4] => Array 
          [file] => file6.html
    [Sub2] => Array
      [file] => file2.html 

借助PHP简单的HTML DOM解析器库,我成功地读取了该文件并提取了菜单

$html = file_get_html($file);
foreach ($html->find("ul[id=menu]") as $ul) {
  ..
}

仅解析菜单的活动部分(省略了指向1个或更多级别的链接)我使用了

$ul->find("ul",-1)

找到外部UL内部的最后一个UL。这非常适合一个文件。

,但是我很难循环浏览所有文件/菜单并保留父/子信息,因为每个菜单都有不同的深度。

感谢所有建议,提示和帮助!

编辑:好,毕竟这不是那么容易:)

顺便说一句,这个库确实是一个很好的工具。对写它的人表示敬意。

这是一个可能的解决方案:

class menu_parse {
    static $missing = array(); // list of missing files
    static private $files = array(); // list of source files to process
    // initiate menu parsing
    static function start ($file)
    {
        // start with root file
        self::$files[$file] = 1;
        // parse all source files
        for ($res=array(); current(self::$files); next(self::$files))
        {
            // get next file name
            $file = key(self::$files);
            // parse the file
            if (!file_exists ($file))
            {
                self::$missing[$file] = 1;
                continue;
            }
            $html = file_get_html ($file);
            // get menu root (if any)
            $root = $html->find("ul[id=menu]",0);
            if ($root) self::menu ($root, $res);
        }
        // reorder missing files array
        self::$missing = array_keys (self::$missing);
        // that's all folks
        return $res;
    }
    // parse a menu at a given level
    static private function menu ($menu, &$res)
    {
        foreach ($menu->children as $elem)
        {
            switch ($elem->tag)
            {
            case "li" : // name and possibly source file of a menu
                // grab menu name
                $name = $elem->plaintext;
                // see if we can find a link to the menu file
                $link = $elem->children(0);
                if ($link && $link->tag == 'a')
                {
                    // found the link
                    $file = $link->href;
                    $res[$name]->file = $file;
                    // add the source file to the processing list
                    self::$files[$file] = 1;
                }
                break;
            case "ul" : // go down one level to grab items of the current menu
                self::menu ($elem, $res[$name]->childs);
            }   
        }
    }
}

用法:

// The result will be an array of menus indexed by item names.
//
// Each menu will be an object with 2 members
// - file   -> source file of the menu
// - childs -> array of menu subtitems
//
$res = menu_parse::start ("root.html");
// parse_menu::$missing will contain all the missing files names
echo "Result : <pre>";
print_r ($res);
echo "</pre><br>missing files:<pre>";
print_r (menu_parse::$missing);
echo "</pre>";

您的测试案例:

Array
(
  [Start] => stdClass Object
    (
      [childs] => Array
        (
          [Sub1] => stdClass Object
            (
              [file] => file1.html
              [childs] => Array
                (
                  [SubSub1] => stdClass Object
                    (
                      [file] => file3.html
                      [childs] => Array
                        (
                          [SubSubSub1] => stdClass Object
                            (
                              [file] => file7.html
                            )
                          [SubSubSub2] => stdClass Object
                            (
                              [file] => file8.html
                            )
                          [SubSubSub3] => stdClass Object
                            (
                              [file] => file9.html
                            )
                        )
                    )
                  [SubSub2] => stdClass Object
                    (
                      [file] => file3.html
                    )
                  [SubSub3] => stdClass Object
                    (
                      [file] => file5.html
                    )
                  [SubSub4] => stdClass Object
                    (
                      [file] => file6.html
                    )
                )
            )
          [Sub2] => stdClass Object
            (
              [file] => file2.html
            )
        )
      [file] => root.html
    )
)
missing files: Array
(
    [0] => file2.html
    [1] => file5.html
    [2] => file6.html
    [3] => file7.html
    [4] => file8.html
    [5] => file9.html
)

注意:

  • 代码假定所有项目名称在给定菜单中都是唯一的。

您可以修改代码以将(sub)菜单作为数字索引和名称作为属性的数组(以便两个具有相同名称的项目不会彼此覆盖),但这会使结果的结构变得复杂。

如果发生这种名称的重复,最好的解决方案是重命名其中一个项目,恕我直言。

  • 代码还假定只有一个根菜单。

可以修改以处理多个,但这并没有多大意义(这意味着根菜单ID重复,这可能会给JavaScript引起麻烦,试图首先尝试处理它)。<<<<<<<<<<<<<<<</p>

这更像是具有向上链接的目录树。file1在第2级上的file3上的file1点,然后将其点回到1级上的文件1,这会导致"不同的深度"。考虑设置一个特定的菜单对象,向上和向下指向,并保留列表,而不是字符串数组。PHP中这种层次的起点可能是这样的类:

class menuItem {
    protected $leftSibling = null;
    protected $rightSibling = null;
    protected $parents = array();
    protected $childs = array();
    protected properties = array();
    // set property like menu name or file name
    function setProp($name, $val) {
        $this->properties[$name] = $val;
    }
    // get a propertue if set, false  otherwise
    function getProp($name) {
        if ( isset($this->properties[$name]) )
            return $this->properties[$name];
        return false;
    }
    function getLeftSiblingsAsArray() {
        $sibling = $this->getLeftSibling();
        $siblings = array();
        while ( $sibling != null ) {
            $siblings[] = $sibling;
            $sibling = $sibling->getLeftSibling();
        }
        return $siblings;
    }
    function addChild($item) {
        $this->childs[] = $item;
    }
    function addLeftSibling($item) {
        $sibling = $this->leftSibling;
        while ( $sibling != null ) {
            if ( $sibling->hasLeft() )
                $sibling = $sibling->getLeftSibling();
            else {
                $sibling->addFinalLeft($item);
                break;
            }
        }
    }
    function addFinalLeft(item) {
        $sibling->leftSibling = $item;
    }
    ....

最新更新