错误:java.io.IOException: Server返回HTTP response code: 403 for



我是开发新手,我正在使用IntelliJ IDEA 2022.1(社区版)。例如,我想连接并检索网页:www.carrefour.fr我有以下错误:

java.io.IOException: Server returned HTTP response code: 403 for URL: https://www.carrefour.fr/
at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1919)
at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1515)
at java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:224)
at Main.main(Main.java:111)
Process finished with exit code 0

我在网上搜索了很多并测试了,但它并没有解决问题。如何纠正这个问题?

import java.io.IOException;
import java.net.URL;
import java.net.*;
import java.io.*;
import java.nio.charset.Charset;
import java.io.BufferedReader;
import java.io.InputStreamReader;
public class Main {
public static void main(String[] args) {
StringBuilder content=new StringBuilder();
// Use try and catch to avoid the exceptions
try
{
URL url=new URL("https://www.carrefour.fr"); // creating a url object
// First set the default cookie manager.
CookieHandler.setDefault(new CookieManager(null, CookiePolicy.ACCEPT_ALL));  
URLConnection urlConnection=url.openConnection(); // creating a urlconnection object
urlConnection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:100.0) Gecko/20100101 Firefox/100.0 Unique/100.7.9656.57");        
// wrapping the urlconnection in a bufferedreader
BufferedReader bufferedReader=new BufferedReader(new InputStreamReader(urlConnection.getInputStream(), Charset.forName("UTF-8")));
String line;

// reading from the urlconnection using the bufferedreader
while((line=bufferedReader.readLine())!=null)
{
content.append(line+"n");
}
bufferedReader.close();
}
catch(Exception e)
{
e.printStackTrace();
}
System.out.println(content.toString());
}
}

你的代码没有问题,但是网站不希望人们在上面运行爬虫。

参见https://www.carrefour.fr/robots.txt
参见Robots排除标准

相关内容

最新更新