我已经在PHP中制作了一个代理抓取器,但我不知道如何检查代理是否处于活动状态



下面的代码从网站上抓取代理,但我想要的是程序来检查代理是否一个接一个地活着,然后将代理保存在文件中。有人能帮我做这件事吗?

<?php
header('Content-Type:application/json'); 
$url = "https://www.my-proxy.com/free-proxy-list.html"; 

$ch = curl_init(); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); 
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); 
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/'.rand(111,999).'.36 (KHTML, like Gecko) Chrome/88.0.'.rand(1111,9999).'.104 Safari/'.rand(111,999).'.36');
curl_setopt($ch, CURLOPT_URL, $url); 
$proxies = array();
$firstcount = 1;
$endcound = 10;
for ($i = $firstcount; $i <= $endcound; $i++){
curl_setopt($ch, CURLOPT_URL, "https://www.my-proxy.com/free-proxy-list-$i.html"); 
$result =curl_exec($ch);

///Get Proxy 
// >102.64.122.214:8085#U
preg_match_all("!d{1,3}.d{1,3}.d{1,3}.d{1,3}:.d{2,4}!", $result, $matches);
$proxies = array_merge($proxies, $matches[0]);
}
curl_close($ch);
print_r($proxies);
?>

有多种测试方法,最简单的一种是在'file_get_contents'请求中使用选项

$options = array(
'http'=>array(
'proxy' => 'tcp://' . $prox,   //IP:PORT info. ie: 8.8.8.8:2222
'timeout' => 2,
'request_fulluri' => true,
'method'=>"GET",
'header'=>"Accept-language: enrn" .
"User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.76 Safari/537.36rn"
)
);
$context = stream_context_create($options);
$base_url='http://lotsofrandomstuff.com/1.php'; //url that simply returns '1' each time
$web=@file_get_contents($base_url,false,$context); 
if($web=='1')
{
echo "proxy is good";
}
else
{
echo "proxy is dead";
}

最新更新