PHP Simple HTML DOM 解析器：如何获取包含标签的父 div 的内容<h1>？

我正在抓取（使用PHP简单HTML DOM）许多不同的（新闻）网站，目的是获取页面上的主要内容/正文。

要做到这一点，我能找到的最好方法是找到主标题/标题（H1），并将文本包含在与该标题标记相同的div中。

在下面的两个示例中，我将如何获取整个（parent？）div的内容。

<div>  <----- need to get contents of this whole div (containing the h1 and likely the main body of text)
  <h1></h1>
  main body of text here
</div>

潜水可能在树的更高处。

<div> <----- need to get contents of this whole div
  <div>   
    <h1></h1>
  </div>
  <div>
    main body of text here
  </div>
</div>

再往树上跳。

<div> <----- need to get contents of this whole div
  <div>
    <div>   
      <h1></h1>
    </div>
    <div>
      main body of text here
    </div>
  </div>
</div>

然后我可以比较每个的大小，并确定主分区。

您可以使用parent来获取h1:的父元素

# assuming that the <h1> element is the first <h1> on the page:
$div = $html->find('h1', 0)->parent();

假设$e包含您选择的H1元素。您可以调用$e->parent（）来获取父元素。

查看"遍历DOM树"选项卡上的"如何遍历DOM树？"。http://simplehtmldom.sourceforge.net/manual.htm

相关内容

最新更新

热门标签：