我想使用 jsoup 在每个文本之后提取一个文本。有什么办法可以选择它吗?
示例代码如下所示:
<div class="content">
<div name="panel-summary" id="summary">
<p>
<strong>A: </strong>*thank you* **I want to retrieve this text**<br>
<strong>B: </strong>*Bla..bla* *I don't want this text*<br>
<strong>C: </strong>*what ever text* *I dont want this*
<strong>D: </strong>*anythinh text* *I want this*<br>
<strong>E: </strong>*Bla..bla* *I don't want this text*t<br>
<strong>F: </strong>*anythinh text* *I want this*<br>
</p>
<p>I want this</p>
完成后,它会创建自动 ID 示例 id=123
如果我们可以假设您要查找的所有<strong>
元素将始终包含A:
或D:
或F:
那么使用strong:matchesOwn(regex)
(其中正则表达式将表示A:|D:|F:
(,我们可以选择这些元素。
处理完strong
我们可以转到第二<p>
并通过text()
获取其文本内容。
String html = "<div class="content">n" +
"<div name="panel-summary" id="summary">n" +
" <p>n" +
" <strong>A: </strong>*thank you* **I want to retrieve this text**<br>n" +
" <strong>B: </strong>*Bla..bla* *I don't want this text*<br>n" +
" <strong>C: </strong>*what ever text* *I dont want this* n" +
" <strong>D: </strong>*anythinh text* *I want this*<br>n" +
" <strong>E: </strong>*Bla..bla* *I don't want this text*t<br>n" +
" <strong>F: </strong>*anythinh text* *I want this*<br>n" +
" </p>n" +
"n" +
" <p>I want this</p>";
Document doc = Jsoup.parse(html);
Elements pElements = doc.select("#summary p");
Elements strongElements = pElements.first().select("strong:matchesOwn(A:|D:|F:)");
for (Element strong : strongElements) {
System.out.println(strong.nextSibling());//get next element, including textual element
}
System.out.println("---");
System.out.println(pElements.get(1).text());//textual content of <p>I want this</p>
输出:
*thank you* **I want to retrieve this text**
*anythinh text* *I want this*
*anythinh text* *I want this*
---
I want this
如果您不想依赖<strong>
的内容,而只是依赖其索引,那么请选择所有这些
Elements allStrElemens = doc.select("#summary p strong");
只需通过它们的索引选择您需要的索引(请记住,索引从 0 开始(,例如
System.out.println(allStrElemens.get(0).nextSibling());
System.out.println(allStrElemens.get(3).nextSibling());
System.out.println(allStrElemens.get(5).nextSibling());