javascript解析HTML代码提取句子



我有这个代码html或更长,大约100行:

<div id="translate">
<div> <p>Web <b>dictaphone </b> is built using Thanks to Sole for the Oscilloscope code! English texts for 
beginners to practice reading and comprehension online and for free.</p> 
<p>Practicing your comprehension of <b>written English will</b> both improve your vocabulary and 
understanding of <span class="term-highlight">grammar</span> and word order. The texts below are designed to help you develop while 
giving you an instant evaluation of your progress.</p>
<p>All test went wrong</p>
<p>Web application work CH<sub class="pippo">2</sub> M5 only with localhost</p>
</div>
<div class="code">
<span>
beginners to practice reading and comprehension online and for free <b>dictaphone </b>.
</span>
<span class="term-highlight">grammar</span>
<p>All test went wrong</p>
</div>
</div>
用下面的代码解析html:
let infoElementMT = document.getElementById('translate'); 
recurseDomChildren(infoElementMT, 'en');
export function recurseDomChildren(start, langFrom)
{
var nodes;
if(start.childNodes.length != 0)
{
nodes = start.childNodes;
loopNodeChildren(nodes, langFrom);
}
}

function loopNodeChildren(nodes, langFrom)
{
var node;
for(var i=0;i<nodes.length;i++)
{
node = nodes[i];

if(node.childNodes)
{
recurseDomChildren(node, langFrom);
}
if(node.nodeType === 3){
console.log("NODE text", node)
//outputNode(node, langFrom);
}
}
}

The result i have it is :
ODE text "Web "
NODE text "dictaphone "
NODE text " is built using Thanks to Sole for the Oscilloscope code! English texts for beginners to practice reading and comprehension online and for free. Practicing your comprehension of "
NODE text "written English will"
NODE text " both improve your vocabulary and understanding of "

我怎么能有结果与标记粗体内的句子,而不是分开?

NODE text: "Web <b>dictaphone</b> is built using Thanks to Sole for the Oscilloscope code! English texts for beginners to practice reading and comprehension online and for free.

NODE text: Practicing your comprehension of <b>written English will</b> both improve your vocabulary and understanding of "
NODE text: beginners to practice reading and comprehension online and for free <b>dictaphone </b>.
NODE text: grammar

考虑到HTML代码实际上要长得多,所以代码必须是递归的

(这不是一个真正的答案,只是一个太长,太格式化的评论,不能放在评论框中。)我计划在需求确定后删除它。)

我明白你想用bold标签做什么。我的问题是其他的要求。下面是一个函数,它完全按照您的要求执行,当您向该函数提供输入时,它会给出您所要求的确切输出。

const convert = (input) => 
`     NODE text: "Web <b>dictaphone</b> is built using Thanks to Sole for the Oscilloscope code! English texts for beginners to practice reading and comprehension online and for free.n    n     NODE text: Practicing your comprehension of <b>written English will</b> both improve your vocabulary and understanding of "nnNODE text: beginners to practice reading and comprehension online and for free <b>dictaphone </b>.nnNODE text: grammar    `

const input = `    <div id="translate">n           <div> <p>Web <b>dictaphone </b> is built using Thanks to Sole for the Oscilloscope code! English texts for n                    beginners to practice reading and comprehension online and for free.</p> n                    <p>Practicing your comprehension of <b>written English will</b> both improve your vocabulary and n                    understanding of <span class="term-highlight">grammar</span> and word order. The texts below are designed to help you develop while n                    giving you an instant evaluation of your progress.</p>n                  <p>All test went wrong</p>n                  <p>Web application work CH<sub class="pippo">2</sub> M5 only with localhost</p>n            </div>n<div class="code">n              <span>n                beginners to practice reading and comprehension online and for free <b>dictaphone </b>.n              </span>n              <span class="term-highlight">grammar</span>n              <p>All test went wrong</p>n            </div>n</div>`
console .log (convert (input))

我知道这不是你想要的。无论输入是什么,输出都是一样的;这就是问题的关键:我正在努力全面收集你的需求。我猜可能有语言障碍在起作用,但我不清楚你的输入是如何变成你要求的输出的。为什么这一行从你的输入:
<p>All test went wrong</p>

没有显示在输出中?这个也一样吗?:

<p>Web application work CH<sub class="pippo">2</sub> M5 only with localhost</p>

再往下一点,这个?:

<p>All test went wrong</p>

所以很明显,您希望在输出字符串中保持粗体标记完整,但不清楚您还想做什么。据我所知,您请求的输出与您的输入只有轻微的关联。如果两者都是正确的,那么您需要解释丢失的节点应该发生什么。

最新更新