通过代理java下载一个文件



我通过代理从类似www.example.com/example.pdf的url下载文件并将其保存在java文件系统上时遇到问题。有人知道这是怎么回事吗?如果我得到InputStream,我可以简单地将其保存到文件系统中,如下所示:

final ReadableByteChannel rbc = Channels.newChannel(httpUrlConnetion.getInputStream());    
final FileOutputStream fos = new FileOutputStream(file);
fos.getChannel().transferFrom(rbc, 0, Long.MAX_VALUE);
fos.close();

但如何通过代理获取url的输入流呢?如果我这样做:

SocketAddress addr = new InetSocketAddress("my.proxy.com", 8080);
Proxy proxy = new Proxy(Proxy.Type.HTTP, addr);
URL url = new URL("http://my.real.url.com/");
URLConnection conn = url.openConnection(proxy);

我得到这个例外:

java.net.SocketException: Connection reset
    at java.net.SocketInputStream.read(Unknown Source)
    at java.net.SocketInputStream.read(Unknown Source)
    at java.io.BufferedInputStream.fill(Unknown Source)
    at java.io.BufferedInputStream.read1(Unknown Source)
    at java.io.BufferedInputStream.read(Unknown Source)
    at sun.net.www.http.HttpClient.parseHTTPHeader(Unknown Source)
    at sun.net.www.http.HttpClient.parseHTTP(Unknown Source)
    at sun.net.www.http.HttpClient.parseHTTP(Unknown Source)
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(Unknown Source)
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
    at app.model.mail.crawler.newimpl.FileLoader.getSourceOfSiteViaProxy(FileLoader.java:167)
    at app.model.mail.crawler.newimpl.FileLoader.process(FileLoader.java:220)
    at app.model.mail.crawler.newimpl.FileLoader.run(FileLoader.java:57)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)

使用这个:

final HttpURLConnection httpUrlConnetion = (HttpURLConnection) website.openConnection(proxy);
httpUrlConnetion.setDoOutput(true);
httpUrlConnetion.setDoInput(true);
httpUrlConnetion.setRequestProperty("Content-type", "text/xml");
httpUrlConnetion.setRequestProperty("Accept", "text/xml, application/xml");
httpUrlConnetion.setRequestMethod("POST");
httpUrlConnetion.connect();

我可以下载html网站的源代码,但不能下载文件。也许有人可以帮助我设置下载文件的属性。

以编程方式设置代理:

SocketAddress addr = new InetSocketAddress("my.proxy.com", 8080);
Proxy proxy = new Proxy(Proxy.Type.HTTP, addr);
URL url = new URL("http://my.real.url.com/");
URLConnection conn = url.openConnection(proxy);

然后,您可以将上面的代码与最后一行返回的URLConnection一起使用。如果你愿意,你也可以使用SOCKS代理,或者强制不使用代理。

这是从本Oracle文档中截取的(并经过轻微编辑)。

以下与其他答案不同,对我有效:在连接之前设置这些属性:

            System.getProperties().put("http.proxySet", "true");
            System.getProperties().put("http.proxyHost", "my.proxy.com");
            System.getProperties().put("http.proxyPort", "8080"); //port is String, not int

然后,打开URLConnection并尝试下载该文件。

可以使用Apache httpclient库来解决代理的大部分问题。要编译下面的代码,可以使用以下maven:

Maven:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>stackoverflow.test</groupId>
  <artifactId>proxyhttp</artifactId>
  <version>0.0.1-SNAPSHOT</version>
  <packaging>jar</packaging>
  <name>proxy</name>
  <url>http://maven.apache.org</url>
  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
  </properties>
  <dependencies>
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>3.8.1</version>
      <scope>test</scope>
    </dependency>
    <dependency>
      <groupId>org.apache.httpcomponents</groupId>
      <artifactId>httpclient</artifactId>
      <version>4.5.1</version>
    </dependency>
  </dependencies>
</project>

Java代码:

import org.apache.http.HttpHost;
import org.apache.http.client.config.RequestConfig;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;
/**
 * How to send a request via proxy.
 *
 * @since 4.0
 */
public class ClientExecuteProxy {
    public static void main(String[] args)throws Exception {
        CloseableHttpClient httpclient = HttpClients.createDefault();
        try {
            HttpHost target = new HttpHost("www.google.com", 80, "http");
            HttpHost proxy = new HttpHost("127.0.0.1", 8889, "http");
            RequestConfig config = RequestConfig.custom()
                    .setProxy(proxy)
                    .build();
            HttpGet request = new HttpGet("/");
            request.setConfig(config);
            System.out.println("Executing request " + request.getRequestLine() + " to " + target + " via " + proxy);
            CloseableHttpResponse response = httpclient.execute(target, request);
            try {
                System.out.println("----------------------------------------");
                System.out.println(response.getStatusLine());
                System.out.println(EntityUtils.toString(response.getEntity()));
            } finally {
                response.close();
            }
        } finally {
            httpclient.close();
        }
    }
}

另一种方法是在httpUrlConnection的每个实例"内部"实现代理。即:

  1. 不要连接到您想要的真实URL。首先,连接到代理IP和端口,但使用指向所需URL的httpGET方法
  2. 使用setRequestProperty主机设置为您的URL和您可能需要的任何其他标头

如果有效,连接将透明地将文件发送给您。

我有一些使用Sockets的代码。

try {
    Socket sock = new Socket("10.0.241.1", 3128); //proxy IP and port
    InputStream is = sock.getInputStream();
    OutputStream os = sock.getOutputStream();
    String str = "GET http://www.uol.com.br HTTP/1.1rn"; //GET your site
    str += "Host: www.uol.com.brrn"; //again, Host of your site
    str += "Proxy-Authorization: Basic ZWR1YXJkby5wb2NvOmM1NmQyMw==rn"; //if password is needed
    str += "rn";
    os.write(str.getBytes());
    byte[] bb = new byte[1024];
    int L = 0;
    while ((L = is.read(bb)) != -1) {
        //write bytes to file stream...
    }
} catch (Exception ex) {
    //exception handling...
}

你说:"既然可以使用httpUrlConnection,为什么还要使用纯套接字?"。嗯,那时我还不知道httpUrlConnection。

相关内容

最新更新