C#中的快速下载HTML源代码



我正试图从一个网站下载HTML源代码(https://www.faa.gov/air_traffic/flight_info/aeronav/aero_data/NASR_Subscription/)在C#中。

问题是下载一个30kb的HTML页面源代码需要10秒。互联网连接不是问题,因为我可以在这个程序中立即下载10Mb的文件。

以下内容已在单独的线程和主线程中执行。下载仍然需要10-12秒。


1(

using (var httpClient = new HttpClient())
{
using (var request = new HttpRequestMessage(new HttpMethod("GET"), url))
{
var response = await httpClient.SendAsync(request);
}
}

2(

using (var client = new System.Net.WebClient())
{
client.Proxy = null;
response = client.DownloadString(url);
}

3(

using (var client = new System.Net.WebClient())
{
webClient.Proxy = GlobalProxySelection.GetEmptyWebProxy();
response = client.DownloadString(url);
}

4(

WebRequest.DefaultWebProxy = null;
using (var client = new System.Net.WebClient())
{
response = client.DownloadString(url);
}

5(

var client = new WebClient()
response = client.DownloadString(url);

6(

var client = new WebClient()
client.DownloadFile(url, filepath);

7(

System.Net.WebClient myWebClient = new System.Net.WebClient();
WebProxy myProxy = new WebProxy();
myProxy.IsBypassed(new Uri(url));
myWebClient.Proxy = myProxy;
response = myWebClient.DownloadString(url);

8(

using var client = new HttpClient();
var content = await client.GetStringAsync(url);

9(

HttpWebRequest myRequest = (HttpWebRequest)WebRequest.Create(Url);
myRequest.Method = "GET";
WebResponse myResponse = myRequest.GetResponse();
StreamReader sr = new StreamReader(myResponse.GetResponseStream(), System.Text.Encoding.UTF8);
string result = sr.ReadToEnd();
sr.Close();
myResponse.Close();

我想要一种在C#中更快的方法。

非常感谢您提供的任何信息或帮助。

我知道这是过时的,但我想我找到了原因:我在其他网站上也遇到过这种情况。如果你查看响应cookie,你会发现一个名为ak_bmsc的cookie。该cookie显示网站正在运行Akamai Bot Manager。它提供机器人保护,从而阻止"看起来"可疑的请求。

为了从主机获得快速响应,您需要正确的请求设置。在这种情况下:

  • 标题:
    • Host:(其主机数据(www.faa.gov
    • Accept:(类似于:(*/*
  • Cookie:
    • AkamaiEdge = true

示例:

class Program
{
private static readonly HttpClient _client = new HttpClient();
private static readonly string _url = "https://www.faa.gov/air_traffic/flight_info/aeronav/aero_data/NASR_Subscription/";
static async Task Main(string[] args)
{
var sw = Stopwatch.StartNew();
using (var request = new HttpRequestMessage(HttpMethod.Get,_url))
{
request.Headers.Add("Host", "www.faa.gov");
request.Headers.Add("Accept", "*/*");
request.Headers.Add("Cookie", "AkamaiEdge=true");
Console.WriteLine(await _client.SendAsync(request));
}
Console.WriteLine("Elapsed: {0} ms", sw.ElapsedMilliseconds);
}
}

我需要896毫秒。

顺便说一下,您不应该将HttpClient放在using块中。我知道它是一次性的,但它并不是专门用来处理的。

这个问题难住了我问过的每个人。我已经找到了一个我将坚持的解决方案。

这个解决方案平均在0.5秒内完成我需要它做的事情。根据我的判断,这只适用于窗口。如果用户没有";CURL";我回到过去,用10秒的时间得到我需要的东西。

该解决方案在临时目录中创建一个批处理文件;CURL";网站,然后将CURL的结果输出到temp目录中的.txt文件中。

private static void CreateBatchFile() 
{
string filePath = $"{tempPath}\tempBat.bat";
string writeMe = "cd "%temp%\ProgramTempDir"n" +
"curl "https://www.faa.gov/air_traffic/flight_info/aeronav/aero_data/NASR_Subscription/">FAA_NASR.txt";
File.WriteAllText(filePath, writeMe);
}
private static void ExecuteCommand()
{
int ExitCode;
ProcessStartInfo ProcessInfo;
Process Process;
ProcessInfo = new ProcessStartInfo("cmd.exe", "/c " + $"{tempPath}\tempBat.bat");
ProcessInfo.CreateNoWindow = true;
ProcessInfo.UseShellExecute = false;
Process = Process.Start(ProcessInfo);
Process.WaitForExit();
ExitCode = Process.ExitCode;
Process.Close();
}

private static void GetResponse()
{
string response;

string url = "https://www.faa.gov/air_traffic/flight_info/aeronav/aero_data/NASR_Subscription/";
CreateBatchFile();

ExecuteCommand();
if (File.Exists($"{tempPath}\FAA_NASR.txt")  && File.ReadAllText($"{tempPath}\FAA_NASR.txt").Length > 10)
{
response = File.ReadAllText($"{tempPath}\FAA_NASR.txt");
}
else
{
// If we get here the user does not have Curl, OR Curl returned a file that is not longer than 10 Characters.
using (var client = new System.Net.WebClient())
{
client.Proxy = null;
response = client.DownloadString(url);
}
}
}

最新更新