我想找到一个preg_match结果并检索信息。在那场比赛之后,我需要找到在下一场比赛之前有多少个hmtl标签"a"。我把那些html‘A’标记的编号和preg_match放在一个数组中。所有这些都在一个文件中,我无法修改该文件。
这样做的目的是识别每个html‘a’标记,并给他一个类别(matche)。唯一的问题是,我验证了文件的每一行,并且我想在一个简单的字符串中查找。这是我的代码和文件。
函数
function retrieveCatAndNumber($filepath){
$nbTag = array();
$lines = file($filepath);
//print_r($lines);
$i = 0;
$countA = 0;
$lineNumber = 1;
$nbLines = count($lines);
// Loop through our array, show HTML source as HTML source; and line numbers too.
foreach ($lines as $line_num => $line) {
// echo htmlspecialchars($line)."<br>";
if(preg_match("/<a(.*?)</a>/s", $line, $matcheA)){
$countA++;
}//if A
// echo "/<td class=pp bgcolor='#0051AB'>(.*?)</td>";
$result = preg_match_all("/<td class=pp bgcolor='#0051AB'>(.*?)</td>/s", trim($line), $matcheTD);
if($result == 1){
//echo $result;
$prev = $i - 1;
//echo htmlspecialchars($line)."<BR>";
$nbTag[$i][0] = preg_replace('/[x00-x1Fx80-xFF]/', '', $matcheTD[0]);
print_r($matcheTD);
echo "<br>";
if($i != 0){
$nbTag[$prev][1] = $countA;
$countA = 0;
}
$i++;
}//if TD
if($lineNumber == $nbLines){
$nbTag[$i - 1][1] = $countA;
$countA = 0;
}
$lineNumber++;
}//Foreach line of file
echo "tt The category and number of links : ok<BR>";
return $nbTag;
}//检索bTag
文件内容
<is_links><!-- !!!!!! Project General Info !!!!!! --> <tr> <td class=pp bgcolor='#0051AB'>��Project Info</td> </tr> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif">� <?php $filename = './'.$mnem.'_proj_def.htm'; if ( file_exists ($filename) ){ ?> <a href="http://caevgl03.caecorp.cae.com/projects/<?= $mnem ?>/<?= $filename ?>">Project Definition</a> <?php } else { ?> <font color="#A0A0A0">Project Definition</font> <?php } ?> </td> </tr> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif">� <?php $filename = './'.$mnem.'_kickoff.pdf'; if ( file_exists ($filename) ) { ?> <a href="http://caevgl03.caecorp.cae.com/projects/<?= $mnem ?>/<?= $filename ?>">Project Kickoff</a> <?php } else { ?> <font color="#A0A0A0">Project Kickoff</font> <?php } ?> </td> </tr> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif"> �<a href="http://caevgl03.caecorp.cae.com/projects/pages_v1/logbook/logbook.is?mnem=<?= $mnem ?>">Logbook</a> </td> </tr> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif"> �<a href="http://caevgl03.caecorp.cae.com/projects/pages_v1/schedule/schedule.is?mnem=<?= $mnem ?>">Site Schedule</a> </td> </tr> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif"> �<a href="http://caevgl03.caecorp.cae.com/projects/pages_v1/assignee/assignee.is?mnem=<?=$mnem?>">Assignee</a> </td> </tr> <tr> <td class=pp> <img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif"> <a HREF=http://wss.cae.ca/montreal/departments/PMO/PM/2TLM/default.aspx> Sharepoint Site</A></TD></TR> <!-- !!!!!! IS Links !!!!!! --> <tr> <td class=pp bgcolor='#0051AB'> I/S Links</td> </tr> <TR><TD class='pp'><A HREF=../../usr2/html/proj_docs/2tlm/ARJ21_FFS_update_S0W.pdf>Project ARJ21_FFS_update_S0W</A></TD></TR><TR><TD class='pp'> <TR><TD class='pp'><A HREF=../../usr2/html/proj_docs/2tlm/ARJ21_Update_Schedule_rev4_2Oct2012.pdf>ARJ21_Update_Schedule_rev4_2Oct2012</A></TD></TR> <TR><TD class='pp'><A HREF=../../usr2/html/proj_docs/2tlm/2T3F_COMAC_ILC_MOC8_PedestalUpdate_PM_KO.pdf>2T3F COMAC ILC MOC8 PedestalUpdate PM KO</A></TD></TR> <TR><TD class='pp'><A HREF=../../usr2/html/proj_docs/2tlm/FAA_evaluation_S0W.pdf>FAA evaluation S0W</A></TD></TR> <!-- !!!!!! Technical Info !!!!!! --> <tr> <td class=pp bgcolor='#0051AB'>��Technical Info</td> </tr> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif"> �<a href="doors://caedrs01.cae.ca:36691/?version=2&prodID=0&urn=urn:telelogic::1-49ac26073662317a-F-000078c5">Requirements </a> </td> <td class=pp> <TR><TD class='pp'><A HREF=/proj_docs/2tlm/BP1122_R2.pdf>Tech Spec</A></TD></TR> </tr> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif">� <?php $filename = './'.$mnem.'_tpd.pdf'; if ( file_exists ($filename) ) { ?> <TR><TD class='pp'><A HREF=/proj_docs/2tlm/BP1122_R2.pdf>Tech Spec</A></TD></TR> <font color="#A0A0A0">Tech Spec</font> <?php } ?> </td> </tr> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif">� <?php $filename = './'.$mnem.'_sbdr.pdf'; if ( file_exists ($filename) ) { ?> <a href="http://caevgl03.caecorp.cae.com/projects/<?= $mnem ?>/<?= $filename ?>">SBDR</a> <?php } else { ?> <font color="#A0A0A0">SBDR</font> <?php } ?> </td> </tr> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif">� <?php $filename = './'.$mnem.'_pdr.pdf'; if ( file_exists ($filename) ) { ?> <a href="http://caevgl03.caecorp.cae.com/projects/<?= $mnem ?>/<?= $filename?>">PDR</a> <?php } else { ?> <font color="#A0A0A0">PDR</font> <?php } ?> </td> </tr> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif">� <?php $filename = './'.$mnem.'_cdr.pdf'; if ( file_exists ($filename) ) { ?> <a href="http://caevgl03.caecorp.cae.com/projects/<?= $mnem?>/<?= $filename ?>">CDR</a> <?php } else { ?> <font color="#A0A0A0">CDR</font> <?php } ?> </td> </tr> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif"> �<a href="http://wss.cae.ca/sites/cpm/System%20Engineering/Public/Forms/AllItems.aspx?RootFolder=%2fsites%2fcpm%2fSystem%20Engineering%2fPublic%2fArchitectures&FolderCTID=&View=%7b5136364F%2d69EF%2d4EC4%2dB754%2d9C4D8C60D8C8%7d/Projects/<?=$mnem?>">Architecture Dwg</a> </td> </tr> <!-- !!!!!! Test And Eval !!!!!! --> <tr> <td class=pp bgcolor='#0051AB'>��Test & Eval</td> </tr> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif"> �<a href="http://caevgl03.caecorp.cae.com:8080/elog/elog.asp?PROJECT=<?=$mnem?>">eLOG</a> </td> </tr> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif"> �<a href="http://testeval.cae.ca/d75/eqtg2/eqtg2_main.asp?PROJ=<?=$mnem?>">eQTG</a> </td> </tr> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif"> �<a href="http://caevgl03.caecorp.cae.com:8080/edocs/edocs.asp?PROJECT=<?=$mnem?>&DOC_TYPE=ATM">eATM</a> </td> </tr> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif"> �<a href="http://caevgl03.caecorp.cae.com:8080/edocs/edocs.asp?PROJECT=<?=$mnem?>&DOC_TYPE=ITM">eITM</a> </td> </tr> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif"> �<a href="http://caevgl03.caecorp.cae.com:8080/edocs/edocs.asp?PROJECT=<?=$mnem?>&DOC_TYPE=OTM">eOTM</a> </td> </tr> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif"> �<a href="http://caevgl03.caecorp.cae.com:8080/edocs/edocs.asp?PROJECT=<?=$mnem?>&DOC_TYPE=MCD">eMCD</a> </td> </tr> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif"> �<a href="http://testeval.cae.ca/d75/edocs/etp_main.asp?PROJECT=<?=$mnem?>&TEST_PHASE=IHA">eTP</a> </td> </tr> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif"> �<a href="http://caevgl03.caecorp.cae.com:8080/edocs/edocs.asp?PROJECT=<?=$mnem?>&DOC_TYPE=CHKL">eCHKL</a> </td> </tr> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif"> �<a href="http://testeval.cae.ca/d75/First_Flight/prerequ.asp?mnem=<?= $mnem ?>">FF Pre-requiFites</a> </td> </tr> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif"> �<a href="http://caevgl03.caecorp.cae.com:8080/esnag/">eSnag</a> </td> </tr> <!-- !!!!!! Quick Links !!!!!! --> <tr> <td class=pp bgcolor='#0051AB'>��Quick Links</td> </tr> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif"> �<a href="http://web.cae.ca/dept-sites/dept48/Life_Cycle/Version%201.6/CAELC.htm">ISO</a> </td> </tr> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif"> �<a href="http://caevgl03.caecorp.cae.com/templates/Site_Etiquette.ppt">Site Etiquette</a> </td> </tr> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif"> �<a href="http://wss.cae.ca/montreal/departments/ProdAndTech/SIM%20XX1%20Support/Forms/AllItems.aspx?RootFolder=%2fmontreal%2fdepartments%2fProdAndTech%2fSIM%20XX1%20Support%2fCAELIB%20User%20Guides&FolderCTID=&View=%7bBFB10722%2d49AE%2d41DA%2dB20B%2d8A5899366423%7d">CAELib</a> </td> </tr> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif"> �<a href="http://d21app01.cae.ca/webfiles/scm/StarTeamWebPage/index.htm">Starteam Info</a> </td> </tr> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif"> �<a href="http://step1.cae.ca/kb_asp/step1_edt/step1_kb.htm">STEP 1</a> </td> </tr> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif"> �<a href="http://d21app01.cae.ca/webfiles/matrixxweb/mtrx_home.html">MATRIXx</a> </td> </tr> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif"> �<a href="http://caevgl03.caecorp.cae.com/templates/Site_code.doc">Field Site Code of ethics</a> </td> </tr> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif"> �<a href="http://caevgl03.caecorp.cae.com/templates/Cultures.doc">Customer Etiquette</a> </td> </TR> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif"> �<a href="http://caevgl03.caecorp.cae.com">I/S Home</a> </td> </TR> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif"> �<a href="http://caevgl03.caecorp.cae.com/dept62/pm/html/index.html">PM Home</a> </td> </TR> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif"> �<a href="http://web.cae.ca">CAE Web</a> </td> </TR> </is_links>
我得到的结果($nbTag
数组)我只得到一个!
Array ( [0] => ��Project Info [1] => ��Project Info )
谢谢你的建议,随时可以问我任何问题
不要尝试使用regexp解析html文件!使用库解析html。
你可以使用http://www.php.net/manual/en/domdocument.loadhtml.php或者其他图书馆。。。