我正在为我的工作做一个项目,该项目允许用户解析给定的HTML页面,该页面提供有关潜在客户的信息。然而,我面临的问题是,网页在一个表中显示这些线索信息,根据我所理解的,从Javascript函数中填充,所以当Jsoup解析文档时,它找不到表或其任何内容。这是我专门关注的 HTML:
<table class="none" align="center" bgcolor="white" border="0" cellpadding="1" cellspacing="0" width="100%">
<tbody><tr class="tm_tt_ftr1">
<td class="typedata1"> </td>
<td class="typedata1" colspan="3">Name</td>
<td class="typedata1">Phone</td>
<td class="typedata1x" colspan="2">$$$ Summary </td>
</tr>
<tr class="tm_tt_body">
<td class="typedata1" title="Lookup this name historical"><center>
<a href="#" onclick="javascript:Pop_Up('X','Testerson',
'Testerson','Tes','Test');">
N</a></center></td>
<td class="typedata1" colspan="3"> Testerson, Test </td>
<td class="typedata1">
<b><a href="rtrpt.cgi?DATE_OPT=US_TERSE
&RT_SCRIPT=mkcnt/cnt_lookup_phone_cgi.rt&JDATE=TODAY
&DATE1=TODAY&DATE2=TODAY&QSRC=ALL&DETAIL=N
&QPAC=631&QPRE=384&QPNUM=6191" title="Search phone history this number" target="_new">P1:</a></b>
<a href="rtrpt.cgi?
DATE_OPT=X&RT_SCRIPT=mkcnt/lead_phn_cgi.rt
&LEAD=011876280" title="Additional phone numbers this lead" target="_new">
<b>222-222-2222</b></a>
</td>
<td width="10%">Charge </td>
<td width="10%"> 49.00</td>
</tr>
<tr class="tm_tt_body">
<td class="typedata1" title="Lookup this name historical" colspan="1"><center>
</center></td>
<td class="typedata1" colspan="3"> </td>
<td class="typedata1">
<b> </b>
</td>
<td class="fd_tt_body_neg">Paid </td>
<td class="fd_tt_body_neg" colspan="1"> 49.00</td> <!--This is what I am looking to extract -->
</tr>
<tr class="tm_tt_body">
<td> </td>
<td class="typedata1" colspan="3">9 Daniel Ln </td>
<td class="typedata1" colspan="1">Email
<a id="ld_email" href="mailto:testtesterson@gmail.com?subject='L11876280'">
testtesterson@gmail.com</a>
</td>
<td>Due </td>
<td> 0.00</td>
</tr>
<tr class="tm_tt_body">
<td> </td>
<td class="typedata1" colspan="3"> </td>
<td class="typedata1" colspan="1">CB @ -------</td>
<td class="typedata1" colspan="1"> </td>
<td class="typedata1" colspan="1">1B </td>
</tr>
<tr class="tm_tt_body">
<td class="typedata1"><center> 111</center></td>
<td class="typedata1" colspan="3">Springfield NY 11953</td>
<td class="typedata1" colspan="1">Comm: 1314379</td>
<td colspan="2"><center>DC: ., .</center></td>
</tr>
<tr class="tm_tt_body">
<td class="typedata1" colspan="5"> </td>
<td colspan="2">
</td>
</tr>
</tbody></table>
如上所述,Jsoup根本找不到这个表,或者它的任何内容。包含此表的div 具有如下 Javascript 函数:
<script language="Javascript">
function UpdateDiv(){
$.ajax({
url: "http://flag.60north.net/cgi-bin/rtrpt_tabpanel2G_New.cgi",
type: 'POST',
async: true,
dataType: 'html',
data: "RT_SCRIPT=telemkt/prime/leadcgiUpDate_New.rt&DATE_OPT=X&DETAIL=N&LNUM=" + $("input#LNUM").val(),
timeout: 90000,
success:
function(retData){
$(".Lead_Info").html(retData);
}
});
}
</script>
根据我从中了解到的情况,调用这些函数是为了填充表。我想做的是有一种方法来运行该函数,以便用潜在客户的信息填充页面,然后使用 Jsoup 解析它。从我的个人研究中,我发现Selenium API允许在HTML文档中执行Javascript函数,但是,我认为这并不能解决我的问题。据我所知,无论Selenium运行什么,都不会对Jsoup解析HTML产生影响,因为它会连接到url并检索文档。显然,如果 Jsoup 有能力这样做,我会让 Jsoup 运行函数然后解析,但这不是一个可用的功能。为了显示此潜在客户信息,下一个最佳解决方案是什么?
您可以尝试此方法:
WebDriver driver = new ChromeDriver();
driver.get(url);
JavascriptExecutor js = (JavascriptExecutor) driver;
js.executeScript("UpdateDiv();");
然后,从WebDriver中提取html并传递给JSoup进行解析和其他操作:
String html = driver.getPageSource();
Document doc = Jsoup.parse(html);