响应编码 HTTPconnection 获取 HTML 文本



我尝试阅读响应(不仅是这个响应,还有这个站点上的许多响应),这是我函数的代码:

    // HTTP POST request
    private void sendFirstPost() throws Exception {
        String url = "http://g1.botva.ru/login.php";
        URL obj = new URL(url);
        HttpURLConnection con = (HttpURLConnection) obj.openConnection();
        con.setInstanceFollowRedirects(false);
        //add reuqest header
        con.setRequestMethod("POST");
        con.setRequestProperty("Accept", "*/*");
        con.setRequestProperty("Accept-Encoding", "gzip, deflate");
        //con.setRequestProperty("Content-Length", "86");
        con.setRequestProperty("Content-Type", "application/x-www-form-urlencoded");
        con.setRequestProperty("User-Agent", "runscope/0.1");
        String urlParameters = "do_cmd=login&remember=1&password=avmalyutin1234&server=1&email=avmalyutin%40mail.ru";
        // Send post request
        con.setDoOutput(true);
        DataOutputStream wr = new DataOutputStream(con.getOutputStream());
        wr.writeBytes(urlParameters);
        wr.flush();
        wr.close();
        int responseCode = con.getResponseCode();
        System.out.println("nSending 'POST' request to URL : " + url);
        System.out.println("Post parameters : " + urlParameters);
        System.out.println("Response Code : " + responseCode);
        System.out.println("Content Type : " + con.getContentType());
        BufferedReader in = new BufferedReader(
                new InputStreamReader(con.getInputStream(), "cp1251"));
        String inputLine;
        StringBuffer response = new StringBuffer();
        while ((inputLine = in.readLine()) != null) {
            response.append(inputLine);
        }
        in.close();
        //print result
        System.out.println(response.toString());
        byte [] array = response.toString().getBytes("cp1251");
        String buffff = new String(array);
        System.out.println(buffff);
    }

作为内容类型,我得到文本/html; charset=cp1251 。我尝试使用编码cp1251,windows-1251 - 没有好的结果。一旦我设法获得了 HTML 文本,但在那之后,未来的启动不更改任何源代码只会输出不可读的符号。那么,如何才能正确地从响应中获取类似 HTML 的文本呢?

虽然标题说编码是Cp1251,但事实并非如此。服务器正在发送对应于 Cp1252 的字节。

一种检查方法是首先知道您将收到哪些字节:

String text = "Áîòâà Îíëàéí | Áèòâà çà ðåàëüíóþ êàïóñòó!";
for (byte n : text.getBytes("Cp1251")) {
    System.out.printf("%d ", n);
}
System.out.println();
for (byte n : text.getBytes("Cp1252")) {
    System.out.printf("%d ", n);
}
System.out.println();

然后在您收到的字节中查找它们:

for(int n; (n = inputStream.read()) > 0; ) {
    System.out.printf("%d ", (byte) n);
}

相关内容

  • 没有找到相关文章

最新更新