当放置在多维数组中时,用PHP优化正则表达式处理curl请求的速度



代码注释良好

我需要通过合并我对每个值 所做的操作来优化它
<?php

设置url从

开始
$pagenumber = 20;

设置url以

结尾
while ($pagenumber <= 25) {

创建新的cURL资源

$ch = curl_init();
// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, "http://www.bkesher.com/frum_detail.php?num=$pagenumber");
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
// grab URL and pass it to the browser
$content = curl_exec($ch);
// close cURL resource, and free up system resources
curl_close($ch);

//清除返回并只保留要从页面中抓取的表

$newlines = array("t","n","r","&nbsp;","","x0B");
$newcontent = str_replace($newlines, '', $content);
$start = strpos($newcontent,'>Details<');
$end = strpos($newcontent,'</table>',$start);
$table1 = substr($newcontent,$start,$end-$start);
// $table1 = strip_tags($table1);

//检查表是否被填充

if (!empty($table1)) {

//获取name

$start = strpos($table1,'<td');
$end = strpos($table1,'<br />',$start);
$fnames = substr($table1,$start,$end-$start);
$fnames = strip_tags($fnames);
$fnames = preg_replace('/ss+/', '', $fnames);

//获取family

$start = strpos($table1,'<br />');
$end = strpos($table1,'</td>',$start);
$lnames = substr($table1,$start,$end-$start);
$lnames = strip_tags($lnames);
$lnames = preg_replace('/ss+/', '', $lnames);

//获取电话

$start = strpos($table1,'Phone:');
$end = strpos($table1,'</td>              </tr>              <tr>',$start);
$phone = substr($table1,$start,$end-$start);
$phone = strip_tags($phone);
$phone = str_replace("Phone:", "" ,$phone);
$phone = preg_replace('/ss+/', '', $phone);

//获取地址

$start = strpos($table1,'Address:');
$end = strpos($table1,'</td>              </tr>              <tr>',$start);
$ad = substr($table1,$start,$end-$start);
$ad = strip_tags($ad);
$ad = str_replace("Address:", "" ,$ad);
$ad = preg_replace('/ss+/', '', $ad);

$start = strpos($table1,'Apt:');
$end = strpos($table1,'</td>              </tr>              <tr>',$start);
$apt = substr($table1,$start,$end-$start);
$apt = strip_tags($apt);
$apt = str_replace("Apt:", "" ,$apt);
$apt = preg_replace('/ss+/', '', $apt);

//获取国家

$start = strpos($table1,'Country:');
$end = strpos($table1,'</td>              </tr>              <tr>',$start);
$country = substr($table1,$start,$end-$start);
$country = strip_tags($country);
$country = str_replace("Country:", "" ,$country);
$country = preg_replace('/ss+/', '', $country);

//获取城市

$start = strpos($table1,'City:<br />                 State/Province:');
$end = strpos($table1,'</td>              </tr>              <tr>',$start);
$city = substr($table1,$start,$end-$start);
$city = strip_tags($city);
$city = str_replace("City:                 State/Province:", "" ,$city);
$city = preg_replace('/ss+/', '', $city);

//获取压缩文件

$start = strpos($table1,'Zip:');
$end = strpos($table1,'</td>              </tr>              <tr>',$start);
$zip = substr($table1,$start,$end-$start);
$zip = strip_tags($zip);
$zip = str_replace("Zip:", "" ,$zip);
$zip = preg_replace('/ss+/', '', $zip);

//接收电子邮件

$start = strpos($table1,'email:');
$end = strpos($table1,'</td>              </tr>',$start);
$email = substr($table1,$start,$end-$start);
$email = strip_tags($email);
$email = str_replace("email:", "" ,$email);
$email = preg_replace('/ss+/', '', $email);

//将单个结果放在行变量

$cleancontent = array($pagenumber, $fnames, $lnames, $phone, $ad, $apt, $country, $city, $zip, $email);

//将行结果放入主变量

$stack[] = $cleancontent;

//移动到下一页

$pagenumber++;
}

//如果表为空,则跳转到下一页

else {
$pagenumber++;
}
}

//获取所有结果并输出

print "<table>
<tr>
<td>pagenumber</td>
<td>fnames</td>
<td>lnames</td>
<td>phone</td>
<td>ad</td>
<td>apt</td>
<td>country</td>
<td>city</td>
<td>zip</td>
<td>email</td>
</tr>n";
foreach ($stack as $val) {
    print "<tr>n";
    foreach ($val as $no) {
       print " <td>$no</td>n";}
 print "</tr>n"; 
}
print "</table>";
?>

下面的示例代码中有2个正则表达式。
- 1) $rxtable将有问题的表放入$table1.
- 2) $rxdata将数据(以任意顺序)提取到(?<named>.*?)捕获缓冲区。
如果您不想要命名缓冲区,只需将其替换为(.*?),它保留捕获顺序。

正则表达式(1 &2)应该有/s修饰符(不要忽略换行符)
此外,regex(2)应该有/x(展开)修饰符
(见示例代码)

如果你有任何问题,请告诉我。

Test (in Perl):

use strict;
use warnings;
my $rxtable = '>Details<(.*?)</tables*>';
my $rxdata = '
(?=.* <th.*?>s*Name:s*</ths*>s*
      <td.*?>s*(?<fname>.*?)s*<br.*?>s*(?<lname>.*?)s*</tds*>
)
(?=.* <th.*?>s*Phone:s*</ths*>s*
      <td.*?>s*(?<phone>.*?)s*</tds*>
)
(?=.* <th.*?>s*Address:s*</ths*>s*
      <td.*?>s*(?<ad>.*?)s*</tds*>
)
(?=.* <th.*?>s*Apt:s*</ths*>s*
      <td.*?>s*(?<apt>.*?)s*</tds*>
)
(?=.* <th.*?>s*Country:s*</ths*>s*
      <td.*?>s*(?<country>.*?)s*</tds*>
)
(?=.* <th.*?>s*City:s*<br.*?>s*State/Province:s*</ths*>s*
      <td.*?>s*(?<city>.*?)s*</tds*>
)
(?=.* <th.*?>s*Zip:s*</ths*>s*
      <td.*?>s*(?<zip>.*?)s*</tds*>
)
(?=.* <th.*?>s*email:s*</ths*>s*
      <td.*?>s*(?<email>.*?)s*</tds*>
)
';

$/ = undef;
my $html = <DATA>;
my $table;
if ($html =~ /$rxtable/xs) {
    $table = $1;
}
if ($table =~ /$rxdata/xs) {
    print "first name  '$+{fname}'n";
    print "last name   '$+{lname}'n";
    print "phone       '$+{phone}'n";
    print "address     '$+{ad}'n";
    print "apt         '$+{apt}'n";
    print "country     '$+{country}'n";
    print "city        '$+{city}'n";
    print "zip         '$+{zip}'n";
    print "email       '$+{email}'n";
    print "-----------------------n";
    print "first name  '$1'n";
    print "last name   '$2'n";
    print "phone       '$3'n";
    print "address     '$4'n";
    print "apt         '$5'n";
    print "country     '$6'n";
    print "city        '$7'n";
    print "zip         '$8'n";
    print "email       '$9'n";
}

__DATA__
        <div>
          <div align="center"><a href="frum_search.php">New Search</a></div>
        </div>
    </div>
          <div id="deailspage">
        <div id="det_table">
          <div align="center" class="WADAHeaderText">Details</div>
          <table width="260" border="0" align="center" cellpadding="0" cellspacing="0" bgcolor="#EEF1F7" class="WADADataTable">
      <tr>
                <th class="WADADataTableHeader"> Name:</th>
                <td class="WADADataTableCell">Menachem<br />
                  Atlman</td>
            </tr>
              <tr>
                <th class="WADADataTableHeader">Phone:</th>
                <td class="WADADataTableCell">02-651-8139</td>
              </tr>
              <tr>
                <th class="WADADataTableHeader">Address:</th>
                <td class="WADADataTableCell">8 Mishkalov</td>
              </tr>
              <tr>
                <th class="WADADataTableHeader">Apt:</th>
                <td class="WADADataTableCell"></td>
              </tr>
              <tr>
                <th class="WADADataTableHeader">City:<br /> 
                State/Province:</th>
                <td class="WADADataTableCell">Har Nof Jerusalem</td>
              </tr>
              <tr>
                <th class="WADADataTableHeader">Country:</th>
                <td class="WADADataTableCell">Israel</td>
              </tr>
              <tr>
                <th class="WADADataTableHeader">Zip:</th>
                <td class="WADADataTableCell"></td>
              </tr>
              <tr>
                <th class="WADADataTableHeader">email:</th>
                <td class="WADADataTableCell"></td>
              </tr>
          </table>
输出:

first name  'Menachem'
last name   'Atlman'
phone       '02-651-8139'
address     '8 Mishkalov'
apt         ''
country     'Israel'
city        'Har Nof Jerusalem'
zip         ''
email       ''
-----------------------
first name  'Menachem'
last name   'Atlman'
phone       '02-651-8139'
address     '8 Mishkalov'
apt         ''
country     'Israel'
city        'Har Nof Jerusalem'
zip         ''
email       ''

相关内容

  • 没有找到相关文章

最新更新