如何从中获取图像https://exampe.com/captcha.ashx.



我试图从外部网站获取一个repatcha图像,但我总是收到html响应。

返回的html为:

<html> 
<head>
<title>TuEnvio</title>
<style> body { background-color: #dfe6e9; margin: 0; position: absolute; top: 50%; left: 50%; -ms-transform: translate(-50%, -50%); transform: translate(-50%, -50%); } .lds-grid { display: inline-block; position: relative; width: 80px; height: 80px; } .lds-grid div { position: absolute; width: 16px; height: 16px; border-radius: 50%; background: #d63031; animation: lds-grid 1.2s linear infinite; } .lds-grid div:nth-child(1) { top: 8px; left: 8px; animation-delay: 0s; } .lds-grid div:nth-child(2) { top: 8px; left: 32px; animation-delay: -0.4s; } .lds-grid div:nth-child(3) { top: 8px; left: 56px; animation-delay: -0.8s; } .lds-grid div:nth-child(4) { top: 32px; left: 8px; animation-delay: -0.4s; } .lds-grid div:nth-child(5) { top: 32px; left: 32px; animation-delay: -0.8s; } .lds-grid div:nth-child(6) { top: 32px; left: 56px; animation-delay: -1.2s; } .lds-grid div:nth-child(7) { top: 56px; left: 8px; animation-delay: -0.8s; } .lds-grid div:nth-child(8) { top: 56px; left: 32px; animation-delay: -1.2s; } .lds-grid div:nth-child(9) { top: 56px; left: 56px; animation-delay: -1.6s; } @keyframes lds-grid {  0%, 100% {opacity: 1; }  50% {opacity: 0.5; } }</style>
</head>
<body>
<div class="lds-grid"> 
<div></div> <div></div> <div></div> <div></div> <div></div> <div></div> <div></div> <div></div> <div></div></div>
<script type="text/javascript" src="/aes.min.js">
</script>
<script> 
function toNumbers(d) { 
var e = []; 
d.replace(/(..)/g, function (d) {   e.push(parseInt(d, 16)); });
return e; }
function toHex() {
for (   var d = [],     d = 1 == arguments.length && arguments[0].constructor == Array ? arguments[0] : arguments,e = "",f = 0;f < d.length;f++ )e += (16 > d[f] ? "0" : "") + d[f].toString(16);
return e.toLowerCase();
}
var a = toNumbers("d68d69a9a746d20032277ede658ba3ad"), b = toNumbers("58c9e810e2ebcc49ae9ee28af1c6dd53"), c = toNumbers("0102c6e95e39d07a5b4b5bb0b5dcd89c");
document.cookie = "ASP.KLR=" + toHex(slowAES.decrypt(c, 2, a, b)) + "; expires=Session; path=/";
location.href = "https://www.tuenvio.cu/matanzas/captcha.ashx?attempt=1";</script>
</body>
</html>

我的请求代码是:

readonly HttpClient Client;
readonly CookieContainer CookieContainer;
ServicePointManager.SecurityProtocol |= SecurityProtocolType.Tls | SecurityProtocolType.Tls11 | SecurityProtocolType.Tls12;
CookieContainer = new CookieContainer();
HttpClientHandler handler = new HttpClientHandler()
{
CookieContainer = CookieContainer,
UseCookies = true,
SslProtocols = SslProtocols.Tls12 | SslProtocols.Tls11 | SslProtocols.Tls,
ServerCertificateCustomValidationCallback = (sender, certificate, chain, sslPolicyErrors) => true
};
// Create an HttpClient object
Client = new HttpClient(handler);
Client.DefaultRequestHeaders.Add("Origin", BaseUrl);
Client.DefaultRequestHeaders.Add("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36");
Client.DefaultRequestHeaders.Accept.Add(new System.Net.Http.Headers.MediaTypeWithQualityHeaderValue("*/*"));
Client.DefaultRequestHeaders.Connection.Add("keep-alive");
public async Task<Image> getImagen(string Uri)
{
try
{
var req = new HttpRequestMessage(HttpMethod.Get, Uri);
req.Headers.Add("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9");

var resp = await Client.SendAsync(req);
if (resp.IsSuccessStatusCode)
{
var bytes = await resp.Content.ReadAsByteArrayAsync();
var ms = new MemoryStream(bytes);
return Image.FromStream(ms);
}
}
catch (Exception ex)
{
System.Diagnostics.Debug.WriteLine(ex.Message);
}
return null;

图像在web浏览器中成功显示,但HttpClient我只能得到html响应。我该怎么解决这个问题?

好吧,这是我的新方法。我在C中实现了ToNumbers、ToHex和SLowAes.Decrept函数#并使用更新后的Url和添加反cookie再次发出请求。就像如果由web浏览器运行,javascript就可以了。

public async Task<Image> getRecatcha()
{
try
{
string requestUri = BaseUrl + Settings.Tienda + "/captcha.ashx";
var req = new HttpRequestMessage(HttpMethod.Get, requestUri);
var resp = await Client.SendAsync(req);
if (resp.IsSuccessStatusCode)
{
var respbody = await resp.Content.ReadAsStringAsync();
Log(respbody, "_Recaptcha.html");
var result = GetSecurityCookie(respbody);
if (result.Success)
{
var req2 = new HttpRequestMessage(HttpMethod.Get, result.Url);
req2.Headers.Add("cookie", result.Cookie);
var resp2 = await Client.SendAsync(req2);
if (resp2.IsSuccessStatusCode)
{
var respbody2 = await resp2.Content.ReadAsStringAsync();
Log(respbody2, "_Recaptcha_2.html");
var bytes = await resp2.Content.ReadAsByteArrayAsync();
var ms = new MemoryStream(bytes);
return Image.FromStream(ms);
}
}
}
}
catch (Exception ex)
{
System.Diagnostics.Debug.WriteLine(ex.Message);
}
return null;
}

public result GetSecurityCookie(string respbody)
{
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(respbody);
var item = doc.DocumentNode.Descendants().FirstOrDefault(x => x.Name == "script" && !x.Attributes.Any());
if (item == null)
return new result() { Success = false };
string data = item.InnerHtml;
var ma = System.Text.RegularExpressions.Regex.Match(data, "a\s*=\s*toNumbers\s*\(\s*\"(\w+)\"\s*\)", System.Text.RegularExpressions.RegexOptions.IgnoreCase);
var mb = System.Text.RegularExpressions.Regex.Match(data, "b\s*=\s*toNumbers\s*\(\s*\"(\w+)\"\s*\)", System.Text.RegularExpressions.RegexOptions.IgnoreCase);
var mc = System.Text.RegularExpressions.Regex.Match(data, "c\s*=\s*toNumbers\s*\(\s*\"(\w+)\"\s*\)", System.Text.RegularExpressions.RegexOptions.IgnoreCase);
var murl = System.Text.RegularExpressions.Regex.Match(data, "location.href\s *=\s *\"(.+)\"", System.Text.RegularExpressions.RegexOptions.IgnoreCase);
var a = ma.Groups[1].Value;
var b = mb.Groups[1].Value;
var c = mc.Groups[1].Value;
var url = murl.Groups[1].Value;
var des = Decript(c, a, b);
//       var Uri = new Uri(BaseUrl);
//       CookieContainer.Add(Uri, new Cookie("ASP.KLR", des) {Path="/"});
//string cookie = "ASP.KLR=" + des + "; expires=Session; path=/";
string cookie = "ASP.KLR=" + des;
return new result() { Success = true , Url = url , Cookie = cookie };
}

为了简化问题,我没有使用Tonumbers、ToHex和descript函数。

但是服务器仍然以相同的页面响应,只是增加url中的重试次数。

这里既不是C#、JS也不是HTML专家,但我注意到,如果你在没有cookie ASP.KLR的情况下发出请求,服务器会发送带有设置cookie的脚本的页面,然后重定向到https://www.tuenvio.cu/matanzas/captcha.ashx?attempt=1

因此,为了获得图像,您需要发送服务器期望的cookie。您应该通过某种方式运行代码来计算它,或者在C#中实现它,并将页面解析为所需的数据。

然而,就我所见,人们的反应总是一样的。参见变量a、b、c。因此,也许您可以只运行一次该脚本,并在所有请求中使用计算值(cookie(。事实上,您可以使用浏览器来查看值。这至少应该适用于进行一些测试。如果响应更改了用于计算cookie的值,那么为了让它在每次需要时都能工作,而无需手动干预,你应该像我在上一段中所说的那样。

更新:

以下是如何使用curl:获取图像的示例

curl "https://www.tuenvio.cu/carlos3/captcha.ashx" -H "user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36" -H "cookie: ASP.KLR=c27408541b70b97e5003d39a2300ffac" --compressed > captcha.png

相关内容

最新更新