如何选择包含特定文本的元素的最低公共祖先



我不确定是否"最低共同祖先";是正确的术语,我也认为这个问题应该很常见,我试图在网上找到解决方案,但找不到。

所以我有以下结构:

<div> <!-- A -->
<div> <!-- B -->
<div> <!-- C: I need to select this element -->
<div>
<div>
<div>
random string
<div>
<div>
SOMETHING
</div>
<div>
SOMETHING
</div>
</div>
</div>
</div>
</div>
<div>
<div>
<div>
SOMETHING
</div>
</div>
</div>
<div>
<div>
<div>
SOMETHING
</div>
</div>
</div>
<div>
<div>
<div>
SOMETHING
</div>
</div>
</div>
<div>
<div>
<div>
SOMETHING
</div>
</div>
</div>
</div>
<div>
random string
</div>
</div>
<div>
random string
</div>
</div>

我的目标是选择第一个/最低的元素(在本例中是div C(,它包含包含字符串"的所有子元素/子元素;什么";。

我得到的最接近的解决方案是使用xpath://*[contains(text(),"SOMETHING")]/ancestor::*,但使用它基本上会返回任何包含"的元素;某些";(它确实返回了div C,但也返回了其他元素,我只想得到div C(。

该解决方案不一定要使用xpath,但更可取的是香草javascript,而且它不一定要非常高效。提前谢谢。

通过选择所有文本节点,您可以遍历它们的祖先,并只保留所有节点存在的一个。

function nativeTreeWalker() {
var walker = document.createTreeWalker(
document.body, 
NodeFilter.SHOW_TEXT, 
null, 
false
);
var node;
var textNodes = [];
while(node = walker.nextNode()) {
textNodes.push(node);
}
return textNodes;
}

const nodes = nativeTreeWalker()
.filter(textNode => textNode.textContent.includes('SOMETHING'));
const getAncestors = elm => {
const set = new Set();
while (elm) {
set.add(elm);
elm = elm.parentElement;
}
return set;
};
const ancestors = nodes.map(getAncestors);
const innermostExistingInAll = [...ancestors[0]].find(
possibleParent => ancestors.every(set => set.has(possibleParent))
);
console.log(innermostExistingInAll);
<div> <!-- A -->
<div> <!-- B -->
<div id="c"> <!-- C: I need to select this element -->
<div>
<div>
<div>
random string
<div>
<div>
SOMETHING
</div>
<div>
SOMETHING
</div>
</div>
</div>
</div>
</div>
<div>
<div>
<div>
SOMETHING
</div>
</div>
</div>
<div>
<div>
<div>
SOMETHING
</div>
</div>
</div>
<div>
<div>
<div>
SOMETHING
</div>
</div>
</div>
<div>
<div>
<div>
SOMETHING
</div>
</div>
</div>
</div>
<div>
random string
</div>
</div>
<div>
random string
</div>
</div>

XPath3.1可以用声明的方式表示:

let $text-nodes := //text()[contains(., 'SOMETHING')]
return innermost(//*[every $text in $text-nodes satisfies descendant::text() intersect $text])

XPath 3.1在浏览器中通过Saxonica的SaxonJS库得到支持,文档记录在https://www.saxonica.com/saxon-js/documentation2/index.html.

示例使用

const htmlSnippet = `<div> <!-- A -->
<div> <!-- B -->
<div> <!-- C: I need to select this element -->
<div>
<div>
<div>
random string
<div>
<div>
SOMETHING
</div>
<div>
SOMETHING
</div>
</div>
</div>
</div>
</div>
<div>
<div>
<div>
SOMETHING
</div>
</div>
</div>
<div>
<div>
<div>
SOMETHING
</div>
</div>
</div>
<div>
<div>
<div>
SOMETHING
</div>
</div>
</div>
<div>
<div>
<div>
SOMETHING
</div>
</div>
</div>
</div>
<div>
random string
</div>
</div>
<div>
random string
</div>
</div>`;
var searchText = 'SOMETHING';
const htmlDoc = new DOMParser().parseFromString(htmlSnippet, 'text/html');
const xpathResult = SaxonJS.XPath.evaluate(
`let $text-nodes := //text()[contains(., $search-text)]
return innermost(//*[every $text in $text-nodes satisfies descendant::text() intersect $text])`, 
htmlDoc, 
{ params : { 'search-text' : searchText } }
);
console.log(xpathResult);
<script src="https://martin-honnen.github.io/Saxon-JS-2.5/SaxonJS2.rt.js"></script>

您可以从最外面的元素向下遍历,直到子代有多个子元素包含文本:

let commonAnc;
let hasTxt = [document.getElementById('mainContainer')]; // or whatever outermost element you want 
while(hasTxt.length == 1) {
commonAnc = hasTxt[0]; hasTxt = []; 
let caChilds = commonAnc.children;
for(let i=0; i<caChilds.length; i++) {
if (caChilds[i].textContent.includes("SOMETHING")){
hasTxt.push(caChilds[i]); 
} 
}
}
console.log(commonAnc);
<div id="mainContainer"> 
<div id="a"> <!-- A -->
<div id="b"> <!-- B -->
<div id="c"> <!-- C: I need to select this element -->
<div id="d">
<div>
<div>
random string
<div>
<div>
SOMETHING
</div>
<div>
SOMETHING
</div>
</div>
</div>
</div>
</div>
<div>
<div>
<div>
SOMETHING
</div>
</div>
</div>
<div>
<div>
<div>
SOMETHING
</div>
</div>
</div>
<div>
<div>
<div>
SOMETHING
</div>
</div>
</div>
<div>
<div>
<div>
SOMETHING
</div>
</div>
</div>
</div>
<div>
random string
</div>
</div>
<div>
random string
</div>
</div>
</div>

感觉效率很低,但[我认为]这是最简单的方法,考虑到似乎没有获得最亲密共同祖先的内置方法。。。

最新更新