如何使用jsoub或任何其他工具从网站获取完整的html代码



我正在尝试从网站获取html代码,如果网站代码像这样小:(https://abdelftahzowail.github.io/WriteUpsideDown/)我得到了完整的代码,但如果网站代码像这样大:(https://www.pixel4k.com/page/1?s=deadpool)我没有完整的代码

我试过JsoupHttpURLConnection,但没有给我完整的代码

这是我的代码

Thread thread = new Thread(() -> {
try  {
Document doc;
doc = Jsoup.connect(editText.getText().toString())
.header("Accept-Encoding", "gzip, deflate")
.userAgent("Mozilla/5.0 (Windows NT 6.2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.69 Safari/537.36")
.maxBodySize(0)
.timeout(0)
.get();
Log.i("IMPORTANT !!!!","doc ( "+editText.getText().toString()+" )n"+doc);
} catch (Exception e) {
Log.i("IMPORTANT !!!!","error : "+e);
}
});
thread.start();

这是我从这个网站上得到的代码(https://www.pixel4k.com/page/1?s=deadpool)

<!doctype html>
<html class="no-js" lang="en-US" prefix="og: http://ogp.me/ns#"> 
<head> 
<meta charset="UTF-8"> 
<title>You searched for deadpool - 4k Wallpapers ,Hd Wallpapers,Desktop Wallpapers, Free Backgrounds Download, Widescreen Wallpapers</title> 
<link rel="icon" href="https://www.pixel4k.com/wp-content/uploads/2018/09/favicon.ico" type="image/x-icon"> 
<link rel="apple-touch-icon" href="apple-touch-icon.png"> 
<meta name="viewport" content="width=device-width, initial-scale=1.0"> 
<meta name="apple-mobile-web-app-capable" content="yes"> 
<meta name="apple-mobile-web-app-status-bar-style" content="black"> 
<link rel="stylesheet" type="text/css" media="all" href="https://www.pixel4k.com/wp-content/themes/pxxx/style.css"> 
<link rel="pingback" href="https://www.pixel4k.com/xmlrpc.php"> 
<meta name="google-site-verification" content="xHAo1q6wJG7bz-iw00VylrwaMabFjK_xSyU1jakgwaQ"> 
<meta name="wot-verification" content="317f71c46e1fb6060ce1"> 
<script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js" type="f8f50ad6803275492fa5ce1d-text/javascript"></script> 
<script type="f8f50ad6803275492fa5ce1d-text/javascript">(adsbygoogle=window.adsbygoogle||[]).push({google_ad_client:"ca-pub-2555268506534283",enable_page_level_ads:true});</script> <!--[if lt IE 9]>
<script src="https://html5shim.googlecode.com/svn/trunk/html5.js"></script>
<![endif]--> 
<meta name="robots" content="noindex,follow"> 
<link rel="next" href="https://www.pixel4k.com/search/deadpool/page/2"> 
<meta property="og:locale" content="en_US"> 
<meta property="og:type" content="object"> 
<meta property="og:title" content="You searched for deadpool - 4k Wallpapers ,Hd Wallpapers,Desktop Wallpapers, Free Backgrounds Download, Widescreen Wallpapers"> 
<meta property="og:url" content="https://www.pixel4k.com/search/deadpool"> 
<meta property="og:site_name" content="4k Wallpapers ,Hd Wallpapers,Desktop Wallpapers, Free Backgrounds Download, Widescreen Wallpapers"> 
<meta name="twitter:card" content="summary_large_image"> 
<meta name="twitter:title" content="You searched for deadpool - 4k Wallpapers ,Hd Wallpapers,Desktop Wallpapers, Free Backgrounds Download, Widescreen Wallpapers"> 
<script type="application/ld+json">{"@context":"https://schema.org","@type":"Person","url":"https://www.pixel4k.com/","sameAs":[],"@id":"#person","name":"Mika"}</script> 
<link rel="dns-prefetch" href="//ajax.googleapis.com"> 
<link rel="dns-prefetch" href="//www.pixel4k.com"> 
<link rel="alternate" type="application/rss+xml" title="4k Wallpapers ,Hd Wallpapers,Desktop Wallpapers, Free Backgrounds Download, Widescreen Wallpapers » Feed" href="https://www.pixel4k.com/feed"> 
<link rel="alternate" type="application/rss+xml" title="4k Wallpapers ,Hd Wallpapers,Desktop Wallpapers, Free Backgrounds Download, Widescreen Wallpapers » Comments Feed" href="https://www.pixel4k.com/comments/feed"> 
<link rel="alternate" type="application/rss+xml" title="4k Wallpapers ,Hd Wallpapers,Desktop Wallpapers, Free Backgrounds Download, Widescreen Wallpapers » Search Results for “deadpool” Feed" href="https://www.pixel4k.com/search/deadpool/feed/rss2/"> 
<style type="text/css">img.wp-smiley,img.emoji{display:inline!important;border:none!important;box-shadow:none!important;height:1em!important;width:1em!important;margin:0 .07em!important;vertical-align:-.1em!important;background:none!important;padding:0!important}</style> 
<link rel="stylesheet" id="wp-block-library-css" href="https://www.pixel4k.com/wp-includes/css/dist/block-library/style.min.css?ver=5.3.8" type="text/css" media="all"> 
<style id="rocket-lazyload-inline-css" type="text/css">.rll-youtube-player{position:relative;padding-bottom:56.23%;height:0;overflow:hidden;max-width:100%;background:#000;margin:5px}.rll-youtube-player iframe{position:absolute;top:0;left:0;width:100%;height:100%;z-index:100;background:0 0}.rll-youtube-player img{bottom:0;display:block;left:

但是这个应用(https://play.google.com/store/apps/details?id=com.teejay.trebedit&hl=en&gl=US(获取完整代码

我该怎么办?

您正在获取所有数据(您的两个url和您的代码生成完整的html(,但android记录器在您调用它时不会输出所有数据。

如果您尝试编写一个文件而不是日志语句,您很可能会注意到您的所有数据都是可用的。

参见。Logcat的大小限制是多少?如何更改其容量?

我在Java中搜索了String的最大长度。根据Takahiko Kawasaki在这个问题中的说法,最大长度是65536个字符。

由于您使用的方法在String中编写网页的HTML代码,这意味着如果您尝试下载的网页小于65.536字节,则您的代码将按预期工作。

我不知道你在获得网页的HTML代码后需要做什么,所以以下建议可能不足以满足你的需要,但是:你是否尝试过将HTML代码存储在StringBuffer而不是String中?

最新更新