PHP 剥离文本(包括 html 标签)


<document page-count="3">
<page number="1">
<table id="p1" data-page="1" data-table="1">>>>(2016-05-20)<<< >>>(name)<<< >>>(cat)<<< </table>
</page>
<page number="2">
<table id="p2" data-page="1" data-table="1">>>>(2016-05-20)<<< >>>(name)<<< >>>(cat)<<< </table>
</page>
<page number="3">
<table id="p3" data-page="1" data-table="1">>>>(2016-05-20)<<< >>>(name)<<< >>>(cat)<<< </table>
</page>

我希望该代码能够回显>>>和<<

我希望以上更像:

<row>
    <date>2016-05-30</date>
    <name>name</name>
    <animal>cat</animal>
</row>

我之所以得到"("和")",是因为某些"行"将有多个值。

我正在寻找一种简单的方法来做到这一点!

$str = '<document page-count="3">
<page number="1">
<table id="p1" data-page="1" data-table="1">>>>(2016-05-20)<<< >>>(name)<<< >>>(cat)<<< </table>
</page>
<page number="2">
<table id="p2" data-page="1" data-table="1">>>>(2016-05-20)<<< >>>(name)<<< >>>(cat)<<< </table>
</page>
<page number="3">
<table id="p3" data-page="1" data-table="1">>>>(2016-05-20)<<< >>>(name)<<< >>>(cat)<<< </table>
</page>';
// if you do not have () around the values use this pattern: /(d{4}-d{2}-d{2}).*(w+).*(w+)/ 
// your text seem to indicate that it doesn't have () around the values
preg_match_all("/((d{4}-d{2}-d{2})).*((w+)).*((w+))/",$str,$match);
// preg_match_all("/((d{4}-d{2}-d{2})).*((w+)).*((w+))/",strip_tags($str),$match);
var_dump($match);

结果:

array(4) {
   [0]=>
      array(3) {
        [0]=> string(37) "(2016-05-20)<<< >>>(name)<<< >>>(cat)"
        [1]=> string(37) "(2016-05-20)<<< >>>(name)<<< >>>(cat)"
        [2]=> string(37) "(2016-05-20)<<< >>>(name)<<< >>>(cat)"
      }
   [1]=>
      array(3) {
        [0]=> string(10) "2016-05-20"
        [1]=> string(10) "2016-05-20"
        [2]=> string(10) "2016-05-20"
      }
   [2]=>
      array(3) {
        [0]=> string(4) "name"
        [1]=> string(4) "name"
        [2]=> string(4) "name"
      }
   [3]=>
      array(3) {
        [0]=> string(3) "cat"
        [1]=> string(3) "cat"
        [2]=> string(3) "cat"
      }
}

但那是HTML上的正则表达式,可能会导致问题。
如果您使用 strip_tags(),它会更安全,但在我的测试字符串上它不能很好地工作,因为所有字符串都聚集在一行中。他们不应该这样做。
我建议你先试试strip_tags。这可能是我的测试字符串上的换行错误。

http://sandbox.onlinephpfunctions.com/code/387951bce623573259b81745942c86dd828306db

编辑:对不起,我忘记了你想要的结果:

http://sandbox.onlinephpfunctions.com/code/06ed1f8075b17af08daf0843f4ba8b1066405da6

$str = '<document page-count="3">
<page number="1">
<table id="p1" data-page="1" data-table="1">>>>(2016-05-20)<<< >>>(name)<<< >>>(cat)<<< </table>
</page>
<page number="2">
<table id="p2" data-page="1" data-table="1">>>>(2016-05-20)<<< >>>(name)<<< >>>(cat)<<< </table>
</page>
<page number="3">
<table id="p3" data-page="1" data-table="1">>>>(2016-05-20)<<< >>>(name)<<< >>>(cat)<<< </table>
</page>';
preg_match_all("/((d{4}-d{2}-d{2})).*((w+)).*((w+))/",$str,$match);
//preg_match_all("/((d{4}-d{2}-d{2})).*((w+)).*((w+))/",strip_tags($str),$match);
$newstr= "";
for($i=0;$i<=count($match[0])-1;$i++){
    $newstr .= "<row>
    <date>" . $match[1][$i] ."</date>
    <name>" . $match[2][$i] ."</name>
    <animal>".$match[3][$i] ."</animal>
    </row>
    ";
}
echo $newstr;

试试这个:-

$str = '<document page-count="3">
<page number="1">
<table id="p1" data-page="1" data-table="1">>>>(2016-05-20)<<< >>>(name)<<< >>>(cat1)<<< </table>
</page>
<page number="2">
<table id="p2" data-page="1" data-table="1">>>>(2016-05-21)<<< >>>(name1)<<< >>>(cat2)<<< </table>
</page>
<page number="3">
<table id="p3" data-page="1" data-table="1">>>>(2016-05-22)<<< >>>(name2)<<< >>>(cat3)<<< </table>
</page>';

preg_match_all('/>>>((.*?))<<</', $str, $matches);
$res =  array_chunk($matches[1], 3);
echo '<pre>'; print_r($res);

输出:-

Array
(
    [0] => Array
        (
            [0] => 2016-05-20
            [1] => name
            [2] => cat1
        )
    [1] => Array
        (
            [0] => 2016-05-21
            [1] => name1
            [2] => cat2
        )
    [2] => Array
        (
            [0] => 2016-05-22
            [1] => name2
            [2] => cat3
        )
)

为了满足您的要求,请添加以下代码:-

$str = '';
foreach($res as $v){
  $str .= '<row>';
  $str .= "<date>{$v[0]}</date>";
  $str .= "<name>{$v[1]}</name>";
  $str .= "<animal>{$v[2]}</animal>";
  $str .= '</row>';
}
echo $str;

输出:-

<row>
<date>2016-05-20</date>
<name>name</name>
<animal>cat1</animal>
</row>
<row>
<date>2016-05-21</date>
<name>name1</name>
<animal>cat2</animal>
</row>
<row>
<date>2016-05-22</date>
<name>name2</name>
<animal>cat3</animal>
</row>

最新更新