使用Puppeter页面运行嵌套循环$$eval(..)



我有一个html文档,其中包含两个级别的重复html元素。第一级是数量可变的试题(1->n(,然后在每个试题中,都有数量可变的可能答案(2->n(。

使用Puppeteer的页面$$eval(…(函数,我需要遍历这两个级别并捕获与每个级别关联的数据。

我能够捕获一级数据(测试问题(,但无法弄清楚如何迭代并捕获嵌套的内部级别(可能的答案(。

以下是我迄今为止所拥有的。。。

示例HTML:

<html>
<body>
<div id="8888" class="course-wrapper">
<div class="question">
<div class="q-label">Question 1</div>
<div class="q-question">Is this question #2?</div>
<div class="q-choices">
<div class="choice">
<div class="num">1.</div>
<div class="answer">yes</div>
<div class="answer-check">incorrect</div>
</div>
<div class="choice">
<div class="num">2.</div>
<div class="answer">no</div>
<div class="answer-check">correct</div>
</div>
<div class="choice">
<div class="num">3.</div>
<div class="answer">perhaps</div>
<div class="answer-check">incorrect</div>
</div>
<div class="choice">
<div class="num">4.</div>
<div class="answer">all of the above</div>
<div class="answer-check">incorrect</div>
</div>
</div>
</div>
<div class="question">
<div class="q-label">Question 2</div>
<div class="q-question">How far is it to Tipperary?</div>
<div class="q-choices">
<div class="choice">
<div class="num">1.</div>
<div class="answer">a long way</div>
<div class="answer-check">correct</div>
</div>
<div class="choice">
<div class="num">2.</div>
<div class="answer">not so far</div>
<div class="answer-check">incorrect</div>
</div>
<div class="choice">
<div class="num">3.</div>
<div class="answer">a very long way</div>
<div class="answer-check">incorrect</div>
</div>
<div class="choice">
<div class="num">4.</div>
<div class="answer">right next door</div>
<div class="answer-check">incorrect</div>
</div>
<div class="choice">
<div class="num">5.</div>
<div class="answer">just over the hill</div>
<div class="answer-check">incorrect</div>
</div>
</div>
</div>
<div class="question">
<div class="q-label">Question 3</div>
<div class="q-question">Is this question #3?</div>
<div class="q-choices">
<div class="choice">
<div class="num">1.</div>
<div class="answer">yes</div>
<div class="answer-check">correct</div>
</div>
<div class="choice">
<div class="num">2.</div>
<div class="answer">no</div>
<div class="answer-check">incorrect</div>
</div>
<div class="choice">
<div class="num">3.</div>
<div class="answer">perhaps</div>
<div class="answer-check">incorrect</div>
</div>
<div class="choice">
<div class="num">4.</div>
<div class="answer">all of the above</div>
<div class="answer-check">incorrect</div>
</div>
<div class="choice">
<div class="num">5.</div>
<div class="answer">i don't know</div>
<div class="answer-check">incorrect</div>
</div>
<div class="choice">
<div class="num">6.</div>
<div class="answer">none of your business</div>
<div class="answer-check">incorrect</div>
</div>
</div>
</div>
</div>
</body>
</html>

目标JSON:

[
{
"courseUNID": "8888",
"qCount": 3,
"qArray": [
{
"testQLabel": "Question 1",
"testQ": "Is this question 2?",
"testQAnswerChoiceCount": 4,
"testQPossibleAnswers": [
{
"Num": "1.",
"Answer": "yes",
"AnswerCheck": "incorrect"
},
{
"Num": "2.",
"Answer": "no",
"AnswerCheck": "correct"
},
{
"Num": "3.",
"Answer": "perhaps",
"AnswerCheck": "incorrect"
},
{
"Num": "4.",
"Answer": "all of the above",
"AnswerCheck": "incorrect"
}
]
},
{
"testQLabel": "Question 2",
"testQ": "How far is it to Tipperary?",
"testQAnswerChoiceCount": 5,
"testQPossibleAnswers": [
{
"Num": "1.",
"Answer": "a long way",
"AnswerCheck": "correct"
},
{
"Num": "2.",
"Answer": "not so far",
"AnswerCheck": "incorrect"
},
{
"Num": "3.",
"Answer": "a very long way",
"AnswerCheck": "incorrect"
},
{
"Num": "4.",
"Answer": "right next door",
"AnswerCheck": "incorrect"
},
{
"Num": "5.",
"Answer": "just over the hill",
"AnswerCheck": "incorrect"
}
]
},
{
"testQLabel": "Question 3",
"testQ": "Is this question 2?",
"testQAnswerChoiceCount": 6,
"testQPossibleAnswers": [
{
"Num": "1.",
"Answer": "yes",
"AnswerCheck": "incorrect"
},
{
"Num": "2.",
"Answer": "no",
"AnswerCheck": "correct"
},
{
"Num": "3.",
"Answer": "perhaps",
"AnswerCheck": "incorrect"
},
{
"Num": "4.",
"Answer": "all of the above",
"AnswerCheck": "incorrect"
},
{
"Num": "5.",
"Answer": "i don't know",
"AnswerCheck": "incorrect"
},
{
"Num": "6.",
"Answer": "none of your business",
"AnswerCheck": "incorrect"
}
]
}
]
}
]

代码(需要嵌套页的帮助。$$eval(…(函数(:

const scraped_post_test = async (page) => {
const courseUNID = await page.$eval("#8888", element => element.getAttribute("id"));
const qCount = await page.$$eval("#8888 > div.question > div.q-label", elements => {
return elements.length
});
const qLabel = await page.$$eval("#8888 > div.question > div.q-label", elements => {
return elements.map(element => element.textContent)
});
const qQuestion = await page.$$eval("#8888 > div.question > div.q-question", elements => {
return elements.map(element => element.textContent)
});
const qPossibleAnswersArray = await page.$$eval("#8888 > div.question > div.q-choices", elements => {
/*** nested iteration here? ***/
const answersArray = [
{
"Num": "1.",
"Answer": "a long way",
"AnswerCheck": "correct"
},
{
"Num": "2.",
"Answer": "something else",
"AnswerCheck": "incorrect"
},
...
]
return answersArray
});
let qArray = [];
for (let i = 0; i < qCount; i++) {
let testQObj = {};
testQObj.testQLabel = qLabel[i];
testQObj.testQ = qQuestion[i];
testQObj.testQAnswerChoiceCount = qPossibleAnswersArray.length;
testQObj.testQPossibleAnswers = qPossibleAnswersArray;
await qArray.push(testQObj)
}
return {
courseUNID,
qCount,
qArray
}
}
module.exports = { scraped_post_test }

我可以让你开始:

page.$$eval('.course-wrapper', divs => divs.map(div => {
let questions = [...div.querySelectorAll('.question')]
return {
courseUNID: div.id,
qCount: questions.length,
qArray: questions.map(q => {
let [label, question, choices] = [...q.querySelectorAll('.q-label', '.q-question', '.q-choices')]
return {
testQLabel: label.innerText,
...
}
})
}
}))

相关内容

  • 没有找到相关文章

最新更新