如何使用 JSOUP 解析 html 表?



需要一些帮助来解析来自 html 的 JSOUP。

所以这是链接:带有搜索结果的长链接

我需要从搜索结果部分的表中提取数据。 目前我有这样的东西:

package com.company;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import java.io.File;
import java.io.IOException;
import java.io.PrintWriter;
public class Main {
public static void main(String[] args) throws IOException {
PrintWriter pw = new PrintWriter(new File("MonitorUL.csv"), "windows-1251");
final String colNames = "DepositName;Percentage;MinAmount;Duration";
StringBuilder builder = new StringBuilder();
builder.append(colNames + "n");
String url = "http://www.banki.ru/products/corporate/search/sankt-peterburg/?CURRENCY=982&AMOUNT=&PERIOD=985&show=all&curcount=all&bankid%5B0%5D=322&bankid%5B1%5D=76620&bankid%5B2%5D=327&bankid%5B3%5D=4389&bankid%5B4%5D=2764&bankid%5B5%5D=960&bankid%5B6%5D=325&bankid%5B7%5D=690&bankid%5B8%5D=5306&bankid%5B9%5D=4725&bankid%5B10%5D=193284&bankid%5B11%5D=68665&bankid%5B12%5D=5919&bankid%5B13%5D=191203&bankid%5B14%5D=68768&bankid%5B15%5D=4045#search-result";
Document doc = Jsoup.parse(url);
System.out.println(doc.toString());
Element table = doc.getElementById("thead");
Elements rows = table.select("tr");
for (int i = 0; i < rows.size() ; i++) {
Element row = rows.get(i);
Elements cols = row.select("td");
for (int j = 0; j < cols.size(); j++) {
builder.append(cols.get(j).text());
builder.append(";");
}
builder.append("n");
}
pw.write(builder.toString());
pw.close();
}

}

但它不起作用。 任何想法为什么jsoup不想解析?(也尝试通过 id 获取元素,如"搜索结果"(

提前谢谢。

以下代码片段可能对您有所帮助:

final WebClient webClient = new WebClient(BrowserVersion.CHROME);
webClient.getOptions().setJavaScriptEnabled(true);
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
webClient.getOptions().setTimeout(10000);
try {
HtmlPage htmlPage = webClient.getPage(url);
Document doc = Jsoup.parse(htmlPage.asXml());
Elements table = doc.getElementsByAttributeValueMatching("id","search-result");// This will select the entire section of the table with the "id"
Elements rows = table.select("tr");
System.out.println("No of rows in the table : "+ rows.size());
for (int i = 0; i < rows.size() ; i++) {
Element row = rows.get(i);
Elements cols = row.select("td");
for (int j = 0; j < cols.size(); j++) {
System.out.println(cols.get(j).text()); //modified this lines just to print the result on the console. You can modify this accordingly.
}
}
} catch (Exception e) {
e.printStackTrace();
} finally {
webClient.close();
}

发生这种情况是因为此 URL 不是静态页面。 如果你想拥有这个页面的html,你应该首先使用HTTP客户端库来获取解析它的页面的内容。

最新更新