如何使用带凭据的HttpWebRequest下载(巨大)文本文件



我正在尝试从某个网站自动下载文本文件列表。下载文本文件的过程如下:

  1. 点击文件名,打开一个弹出窗口
  2. 内容在弹出窗口//它可以作为字符串下载,但它太大了,无法使用StreamWriter下载,因为我遇到了内存不足的异常
  3. 右键单击->另存为

我想用HttpWebRequest下载这个文件。

我的代码如下:

string sTmpCookieString = GetGlobalCookies(webBrowser1.Url.AbsoluteUri);
HttpWebRequest fstRequest = (HttpWebRequest)WebRequest.Create(URL);
fstRequest.Method = "GET";
fstRequest.CookieContainer = new System.Net.CookieContainer();
fstRequest.CookieContainer.SetCookies(webBrowser1.Document.Url, sTmpCookieString);
HttpWebResponse fstResponse = (HttpWebResponse)fstRequest.GetResponse();
StreamReader sr = new StreamReader(fstResponse.GetResponseStream());
string sPageData = sr.ReadToEnd();
sr.Close();
string sViewState = ExtractInputHidden(sPageData, "__VIEWSTATE");
string sEventValidation = this.ExtractInputHidden(sPageData, "__EVENTVALIDATION");
string sUrl = URL;
HttpWebRequest hwrRequest = (HttpWebRequest)WebRequest.Create(sUrl);
hwrRequest.Method = "POST";
hwrRequest.CookieContainer = new System.Net.CookieContainer();
string sPostData = "__EVENTTARGET=&__EVENTARGUMENT=&__VIEWSTATE=" + sViewState + "&__EVENTVALIDATION=" + sEventValidation + "&Name=test" + "&Button1=Button";
byte[] bByteArray = Encoding.UTF8.GetBytes(sPostData);
hwrRequest.ContentType = "text/plain";
hwrRequest.CookieContainer.SetCookies(webBrowser1.Document.Url, sTmpCookieString);
hwrRequest.ContentLength = bByteArray.Length;
Stream sDataStream = hwrRequest.GetRequestStream();
sDataStream.Write(bByteArray, 0, bByteArray.Length);
sDataStream.Close();
using (WebResponse response = hwrRequest.GetResponse())
{
using (sDataStream = response.GetResponseStream())
{
StreamReader reader = new StreamReader(sDataStream);
{
string sResponseFromServer = reader.ReadToEnd();
FileStream fs = File.Open(path, FileMode.OpenOrCreate, FileAccess.Write);
Byte[] info = new System.Text.UTF8Encoding(true).GetBytes(sResponseFromServer);
fs.Write(info, 0, info.Length);
fs.Close();
}
}
}

我不断得到HTML,看起来像:

<!DOCTYPE html>
<html>
<head>    
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1" />
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252" />
<meta content="Microsoft Visual Studio 7.0" name="GENERATOR" />
...

我想我的问题可能有人不清楚。因此,如果有人指出,我会详细解释。

我将感谢任何事先的帮助。

您试图同时读取整个响应:

string sResponseFromServer = reader.ReadToEnd();

相反,可以考虑使用以下内容:

using (sDataStream = response.GetResponseStream())
{
FileStream fs = File.Open(path, FileMode.OpenOrCreate, FileAccess.Write);
sDataStream.CopyTo(fs, 10000);
fs.Close();                            
}

第二个参数是缓冲区大小,您可以将其设置为任何合理的值。

使用异步版本的WebRequest下载:WebRequest.GetResponseAsync((

using (WebResponse response = hwrRequest.GetResponse()) { };开始
其余的代码基本上都很好。

根据需要调整用于下载/存储文件的缓冲区大小(此处为132072字节(。不要无缘无故地把它搞小。

目标文件是使用file.Create((创建的,默认为Create New or OverwriteFileShare.None

using (HttpWebResponse httpResponse = (HttpWebResponse)await httpRequest.GetResponseAsync())
using (var stream = httpResponse.GetResponseStream()) {
if (httpResponse.StatusCode == HttpStatusCode.OK) {
try {
int buffersize = 132072;
using (FileStream fileStream = File.Create(["YourFileName"], buffersize, FileOptions.Asynchronous))
{
int read;
byte[] buffer = new byte[buffersize];
while ((read = await stream.ReadAsync(buffer, 0, buffer.Length)) > 0) {
await fileStream.WriteAsync(buffer, 0, read);
}
};
}
catch (DirectoryNotFoundException dnfex) {
throw;  //Log, store&notify. Your usual handling.
}
catch (PathTooLongException ptlex) {
throw;  //Same
}
catch (IOException ioex) {
throw;  //Same
}
}
};
return ["YourFileName"];

最新更新