正则表达式或其他获取正确格式字符串的方法



请帮帮我。我有以下字符串

<p>this is text before first image</p>
<p><a href=""><img class="size-full wp-image-2178636" src="image1.jpg" alt="first" /></a> this is first caption</p>
<p>this is text before second image.</p>
<p><a href=""><img src="image2.jpg" alt="second" class="size-full wp-image-2178838" /></a> this is second caption</p>
<p>there may be many more images</p>

我需要上面的字符串格式如下:

<p>this is text before first image</p>
<a href="">
<figure>
    <img class="size-full wp-image-2178636" src="image1.jpg" alt="first" />
    <figcaption class="newcaption">
        <h1>this is first caption</h1>
    </figcaption>
</figure>
</a>
<p>this is text before second image.</p>
<a href="">
<figure>
    <img class="size-full wp-image-2178636" src="image2.jpg" alt="first" />
    <figcaption class="newcaption">
        <h1>this is second caption</h1>
    </figcaption>
</figure>
</a>
<p>there may be many more images</p>

请帮助我......我们如何通过正则表达式或其他方式做到这一点。我正在使用PHP来做。

问候萨钦。

虽然SO不应该是一个代码编写服务,但这里有一个使用DOMDocument方法的快速n'脏的解决方案:

$html = '...'; // your input data
$input = new DOMDocument();
$input->loadHTML($html);
$ps = $input->getElementsByTagName('p');
$output = new DOMDocument();    
$counter = 0;
foreach ($ps as $p) {
    if ($counter%2 === 0) {
        // text before image
        $p_before_image = $output->createElement("p", $p->nodeValue);
        $output->appendChild($p_before_image);
    }
    elseif ($p->hasChildNodes()) {
        // image output routine
        $as_input = $p->getElementsByTagName("a");
        $a_output = $output->importNode($as_input->item(0));
        $figure = $output->createElement("figure");
        $imgs_input = $p->getElementsByTagName("img");
        $img_output = $output->importNode($imgs_input->item(0)); 
        $figure->appendChild($img_output);
        $figcaption = $output->createElement("figcaption");
        $figcaption->setAttribute("class", "newcaption");
        $h1 = $output->createElement("h1", $p->nodeValue);
        $figcaption->appendChild($h1);
        $figure->appendChild($figcaption);
        $a_output->appendChild($figure);
        $output->appendChild($a_output);
     }
     else {
        // Document malformed
     }
     $counter++;
}
print $output->saveHTML();

请注意,saveHTML()将输出纯旧的 HTML。因此,imgs 不会变成自闭合标签。您可能需要研究saveXML()这是否对您很重要。

最新更新