使用puppeteer从表中的元素中获取href



我试图使用puppeteer访问表元素中的href属性。表格是相当标准的,设置如下:

<table id="sessions">
<tbody>
<tr>
<td>...</td>
<td>abcd</td>
<td>...</td>
<td>
<a href="www.example.com"> Example </a>
</td>
</tr>
<tr>...</tr>
<tr>...</tr>
<tr>...</tr>
</tbody>
</table>

我如何在表中搜索在列n中有innerText == "abcd"的行,然后在同一行的列m中获取链接。然后我将使用page.goto()中的链接我将非常感谢任何帮助!

编辑:到目前为止,我已经尝试了以下方法,但是它有点过于复杂并且不起作用

const text = await page.$$eval('#sessions tr', rows => {
return Array.from(rows, row => {
const columns = row.querySelectorAll('td');
return cols = Array.from(columns, column => column.innerText)
});
});
const links = await page.$$eval('#sessions tr', rows => {
return Array.from(rows, row => {
const columns = row.querySelectorAll('td');
return cols = Array.from(columns, column => column.innerHTML)
});
});
for (var i = 0; i <= result.length; i++){
if (result[i][2] == "abcd"){
usefulLink = links[i][5];
break;
}
}

编辑2:非常感谢vsemozhebuty的帮助,这是我到目前为止的进展:

const href = await page.evaluate(() => {
const table = Array.from(document.querySelectorAll('#sessions tr'));
const tr = [...table.row].find(({ cells }) => cells[0].innerText === "abcd");
if (tr) return tr.cells[1].querySelector('a').href;
return null;
});

像这样?

import puppeteer from 'puppeteer';
const browser = await puppeteer.launch();
const html = `
<!doctype html>
<html>
<head><meta charset='UTF-8'><title>Test</title></head>
<body>
<table id="sessions">
<tbody>
<tr>
<td>abcd</td>
<td><a href="https://www.example.com">Example</a></td>
</tr>
<tr>
<td>efgh</td>
<td><a href="https://www.example.org">Example</a></td>
</tr>
</tbody>
</table>
</body>
</html>`;
try {
const [page] = await browser.pages();
await page.goto(`data:text/html,${html}`);
const href = await page.evaluate(() => {
const table = document.querySelector('table');
const tr = [...table.rows].find(({ cells }) => cells[0].innerText === "abcd");
if (tr) return tr.cells[1].querySelector('a').href;
return null;
});
console.log(href); // https://www.example.com/
} catch (err) { console.error(err); } finally { await browser.close(); }

最新更新