代码注释良好
我需要通过合并我对每个值 所做的操作来优化它<?php
设置url从
开始$pagenumber = 20;
设置url以
结尾while ($pagenumber <= 25) {
创建新的cURL资源
$ch = curl_init();
// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, "http://www.bkesher.com/frum_detail.php?num=$pagenumber");
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
// grab URL and pass it to the browser
$content = curl_exec($ch);
// close cURL resource, and free up system resources
curl_close($ch);
//清除返回并只保留要从页面中抓取的表
$newlines = array("t","n","r"," "," ","x0B");
$newcontent = str_replace($newlines, '', $content);
$start = strpos($newcontent,'>Details<');
$end = strpos($newcontent,'</table>',$start);
$table1 = substr($newcontent,$start,$end-$start);
// $table1 = strip_tags($table1);
//检查表是否被填充
if (!empty($table1)) {
//获取name
$start = strpos($table1,'<td');
$end = strpos($table1,'<br />',$start);
$fnames = substr($table1,$start,$end-$start);
$fnames = strip_tags($fnames);
$fnames = preg_replace('/ss+/', '', $fnames);
//获取family
$start = strpos($table1,'<br />');
$end = strpos($table1,'</td>',$start);
$lnames = substr($table1,$start,$end-$start);
$lnames = strip_tags($lnames);
$lnames = preg_replace('/ss+/', '', $lnames);
//获取电话
$start = strpos($table1,'Phone:');
$end = strpos($table1,'</td> </tr> <tr>',$start);
$phone = substr($table1,$start,$end-$start);
$phone = strip_tags($phone);
$phone = str_replace("Phone:", "" ,$phone);
$phone = preg_replace('/ss+/', '', $phone);
//获取地址
$start = strpos($table1,'Address:');
$end = strpos($table1,'</td> </tr> <tr>',$start);
$ad = substr($table1,$start,$end-$start);
$ad = strip_tags($ad);
$ad = str_replace("Address:", "" ,$ad);
$ad = preg_replace('/ss+/', '', $ad);
$start = strpos($table1,'Apt:');
$end = strpos($table1,'</td> </tr> <tr>',$start);
$apt = substr($table1,$start,$end-$start);
$apt = strip_tags($apt);
$apt = str_replace("Apt:", "" ,$apt);
$apt = preg_replace('/ss+/', '', $apt);
//获取国家
$start = strpos($table1,'Country:');
$end = strpos($table1,'</td> </tr> <tr>',$start);
$country = substr($table1,$start,$end-$start);
$country = strip_tags($country);
$country = str_replace("Country:", "" ,$country);
$country = preg_replace('/ss+/', '', $country);
//获取城市
$start = strpos($table1,'City:<br /> State/Province:');
$end = strpos($table1,'</td> </tr> <tr>',$start);
$city = substr($table1,$start,$end-$start);
$city = strip_tags($city);
$city = str_replace("City: State/Province:", "" ,$city);
$city = preg_replace('/ss+/', '', $city);
//获取压缩文件
$start = strpos($table1,'Zip:');
$end = strpos($table1,'</td> </tr> <tr>',$start);
$zip = substr($table1,$start,$end-$start);
$zip = strip_tags($zip);
$zip = str_replace("Zip:", "" ,$zip);
$zip = preg_replace('/ss+/', '', $zip);
//接收电子邮件
$start = strpos($table1,'email:');
$end = strpos($table1,'</td> </tr>',$start);
$email = substr($table1,$start,$end-$start);
$email = strip_tags($email);
$email = str_replace("email:", "" ,$email);
$email = preg_replace('/ss+/', '', $email);
//将单个结果放在行变量
中$cleancontent = array($pagenumber, $fnames, $lnames, $phone, $ad, $apt, $country, $city, $zip, $email);
//将行结果放入主变量
$stack[] = $cleancontent;
//移动到下一页
$pagenumber++;
}
//如果表为空,则跳转到下一页
else {
$pagenumber++;
}
}
//获取所有结果并输出
print "<table>
<tr>
<td>pagenumber</td>
<td>fnames</td>
<td>lnames</td>
<td>phone</td>
<td>ad</td>
<td>apt</td>
<td>country</td>
<td>city</td>
<td>zip</td>
<td>email</td>
</tr>n";
foreach ($stack as $val) {
print "<tr>n";
foreach ($val as $no) {
print " <td>$no</td>n";}
print "</tr>n";
}
print "</table>";
?>
下面的示例代码中有2个正则表达式。
- 1) $rxtable
将有问题的表放入$table1.
- 2) $rxdata
将数据(以任意顺序)提取到(?<named>.*?)
捕获缓冲区。
如果您不想要命名缓冲区,只需将其替换为(.*?)
,它保留捕获顺序。
正则表达式(1 &2)应该有/s修饰符(不要忽略换行符)
此外,regex(2)应该有/x(展开)修饰符
(见示例代码)
Test (in Perl):
use strict;
use warnings;
my $rxtable = '>Details<(.*?)</tables*>';
my $rxdata = '
(?=.* <th.*?>s*Name:s*</ths*>s*
<td.*?>s*(?<fname>.*?)s*<br.*?>s*(?<lname>.*?)s*</tds*>
)
(?=.* <th.*?>s*Phone:s*</ths*>s*
<td.*?>s*(?<phone>.*?)s*</tds*>
)
(?=.* <th.*?>s*Address:s*</ths*>s*
<td.*?>s*(?<ad>.*?)s*</tds*>
)
(?=.* <th.*?>s*Apt:s*</ths*>s*
<td.*?>s*(?<apt>.*?)s*</tds*>
)
(?=.* <th.*?>s*Country:s*</ths*>s*
<td.*?>s*(?<country>.*?)s*</tds*>
)
(?=.* <th.*?>s*City:s*<br.*?>s*State/Province:s*</ths*>s*
<td.*?>s*(?<city>.*?)s*</tds*>
)
(?=.* <th.*?>s*Zip:s*</ths*>s*
<td.*?>s*(?<zip>.*?)s*</tds*>
)
(?=.* <th.*?>s*email:s*</ths*>s*
<td.*?>s*(?<email>.*?)s*</tds*>
)
';
$/ = undef;
my $html = <DATA>;
my $table;
if ($html =~ /$rxtable/xs) {
$table = $1;
}
if ($table =~ /$rxdata/xs) {
print "first name '$+{fname}'n";
print "last name '$+{lname}'n";
print "phone '$+{phone}'n";
print "address '$+{ad}'n";
print "apt '$+{apt}'n";
print "country '$+{country}'n";
print "city '$+{city}'n";
print "zip '$+{zip}'n";
print "email '$+{email}'n";
print "-----------------------n";
print "first name '$1'n";
print "last name '$2'n";
print "phone '$3'n";
print "address '$4'n";
print "apt '$5'n";
print "country '$6'n";
print "city '$7'n";
print "zip '$8'n";
print "email '$9'n";
}
__DATA__
<div>
<div align="center"><a href="frum_search.php">New Search</a></div>
</div>
</div>
<div id="deailspage">
<div id="det_table">
<div align="center" class="WADAHeaderText">Details</div>
<table width="260" border="0" align="center" cellpadding="0" cellspacing="0" bgcolor="#EEF1F7" class="WADADataTable">
<tr>
<th class="WADADataTableHeader"> Name:</th>
<td class="WADADataTableCell">Menachem<br />
Atlman</td>
</tr>
<tr>
<th class="WADADataTableHeader">Phone:</th>
<td class="WADADataTableCell">02-651-8139</td>
</tr>
<tr>
<th class="WADADataTableHeader">Address:</th>
<td class="WADADataTableCell">8 Mishkalov</td>
</tr>
<tr>
<th class="WADADataTableHeader">Apt:</th>
<td class="WADADataTableCell"></td>
</tr>
<tr>
<th class="WADADataTableHeader">City:<br />
State/Province:</th>
<td class="WADADataTableCell">Har Nof Jerusalem</td>
</tr>
<tr>
<th class="WADADataTableHeader">Country:</th>
<td class="WADADataTableCell">Israel</td>
</tr>
<tr>
<th class="WADADataTableHeader">Zip:</th>
<td class="WADADataTableCell"></td>
</tr>
<tr>
<th class="WADADataTableHeader">email:</th>
<td class="WADADataTableCell"></td>
</tr>
</table>
输出:first name 'Menachem'
last name 'Atlman'
phone '02-651-8139'
address '8 Mishkalov'
apt ''
country 'Israel'
city 'Har Nof Jerusalem'
zip ''
email ''
-----------------------
first name 'Menachem'
last name 'Atlman'
phone '02-651-8139'
address '8 Mishkalov'
apt ''
country 'Israel'
city 'Har Nof Jerusalem'
zip ''
email ''