在java中如何修复HTTP错误416请求的范围无法满足?(从网页下载网页内容时)



我正在尝试下载网页的html内容并获得416状态。我找到了一种解决方案,可以正确将状态代码改进为 200,但仍然无法下载正确的内容。我非常接近,但缺少一些东西。请帮忙。

具有 416 状态的代码:

    public static void main(String[] args) {
        String URL="http://www.xyzzzzzzz.com.sg/";
        HttpClient client = new org.apache.commons.httpclient.HttpClient();
        org.apache.commons.httpclient.methods.GetMethod method = new org.apache.commons.httpclient.methods.GetMethod(URL);
        client.getHttpConnectionManager().getParams().setConnectionTimeout(AppConfig.CONNECTION_TIMEOUT);
        client.getHttpConnectionManager().getParams().setSoTimeout(AppConfig.READ_DATA_TIMEOUT);
        String html = null; InputStream ios = null;
        try {
            int statusCode = client.executeMethod(method);
            ios = method.getResponseBodyAsStream();
            html = IOUtils.toString(ios, "utf-8");
            System.out.println(statusCode);
        }catch (Exception e) {
            e.printStackTrace();
        } finally {
            if(ios!=null) {
                try {ios.close();} 
                catch (IOException e) {e.printStackTrace();}
            }
            if(method!=null) method.releaseConnection();
        }
        System.out.println(html);
    }
Code with 200 status (but htmlContent is not proper):
    public static void main(String[] args) {
        String URL="http://www.xyzzzzzzz.com.sg/";
        HttpClient client = new org.apache.commons.httpclient.HttpClient();
        org.apache.commons.httpclient.methods.GetMethod method = new org.apache.commons.httpclient.methods.GetMethod(URL);
        client.getHttpConnectionManager().getParams().setConnectionTimeout(AppConfig.CONNECTION_TIMEOUT);
        client.getHttpConnectionManager().getParams().setSoTimeout(AppConfig.READ_DATA_TIMEOUT);
        String html = null; InputStream ios = null;
        try {
            int statusCode = client.executeMethod(method);
            if(statusCode == HttpStatus.SC_REQUESTED_RANGE_NOT_SATISFIABLE) {
                method.setRequestHeader("User-Agent", "Mozilla/5.0");
                method.setRequestHeader("Accept-Ranges", "bytes=100-1500");
                statusCode = client.executeMethod(method);
            }
            ios = method.getResponseBodyAsStream();
            html = IOUtils.toString(ios, "utf-8");
            System.out.println(statusCode);
        }catch (Exception e) {
            e.printStackTrace();
        } finally {
            if(ios!=null) {
                try {ios.close();} 
                catch (IOException e) {e.printStackTrace();}
            }
            if(method!=null) method.releaseConnection();
        }
        System.out.println(html);
    }

您的第一个示例代码对我有用,如果我删除设置的标头代码块,则第二个示例代码有效

if(statusCode == HttpStatus.SC_REQUESTED_RANGE_NOT_SATISFIABLE) {
    method.setRequestHeader("User-Agent", "Mozilla/5.0");
    method.setRequestHeader("Accept-Ranges", "bytes=100-1500");
    statusCode = client.executeMethod(method);
}

这有点奇怪,可能是局域网配置问题(防火墙、代理...等等),无论如何HttpClient 3.1已经很老了,使用Apache HttpComponents的httpclient 4.x。

import org.apache.commons.io.IOUtils;
import org.apache.http.HttpResponse;
import org.apache.http.client.HttpClient;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.DefaultHttpClient;
public class Snippet {
    public static void main(String[] args) {
        String url = "http://www.jobstreet.com.sg/";
        HttpClient client = new DefaultHttpClient();
        HttpGet get = new HttpGet(url);
        try {
            HttpResponse res = client.execute(get);
            System.out.println(res.getStatusLine().getStatusCode());
            System.out.println(IOUtils.toString(res.getEntity().getContent()));
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            client.getConnectionManager().shutdown();
        }
    }
}

按预期工作。

尝试使用 HttpClient 4,如果您仍然收到相同的错误,则问题不在您的代码中。

相关内容

  • 没有找到相关文章

最新更新