如何在 php 中使用 UL 将多组 LI 包装在一个字符串中

我从我无法影响的不同来源获取数据(带有一些html的字符串(。字符串包含(但不限于(LI可视分组的元素 - 但缺少父元素UL元素。我需要用UL标签包装LI标签组。

如果字符串中只有一组LI元素，则此方法可以正常工作。我可以轻松使用DOMDocument，搜索LI标签并用新创建的UL标签包装它们。不幸的是，可以有多个组，并且没有定义组的分隔 - 但始终是某种文本或html标签。很容易将这些群体识别为人类:)

所以从逻辑上讲，我需要找到一个开场<li>作为组的起点和一个闭</li>，后面没有另一个开场<li>作为终点，忽略所有空格。

一个示例源字符串可能是(它并不总是有新行，也并不总是那么漂亮(：

Some text
<strong>Some other text</strong>
<li>Element A1</li><li>Element A2</li>
<li>Element A3</li>
Text that separates group A from group B
<li>Element B1</li>
<li>Element B2</li> <li>Element B3</li>
<li>Element B4</li>
<strong>Element that separates group B from group C</strong>
<li>Element C1</li>
<li>Element C2</li>
Text can follow.

期望的结果将是

Some text
<strong>Some other text</strong>
<ul>
<li>Element A1</li><li>Element A2</li>
<li>Element A3</li>
</ul>
Text that separates group A from group B
<ul>
<li>Element B1</li>
<li>Element B2</li> <li>Element B3</li>
<li>Element B4</li>
</ul>
<strong>Element that separates group B from group C</strong>
<ul>
<li>Element C1</li>
<li>Element C2</li>
</ul>
Text can follow.

我正在考虑使用正则表达式(我知道，通常不是 html 的最佳主意(。但是在这里我不知道如何识别结尾</li>(或等(，后面跟着空格或其他开头<li>(或

等(以外的任何东西。

我还可以删除>和<之间的所有空格;也许那时世界会更容易一些。但即便如此，我也不知道如何将开始 LI "包含"为组中的有效跟随元素并排除其他所有内容。

编辑：

我当前糟糕(几乎顽皮(的正则表达式解决方案如下所示：

$txt = preg_replace('/(>)s*(<)/m', '$1$2', $source_text);
$txt = preg_replace("/</li>(?!<li>)/", '</li></ul>', $txt);
$txt = preg_replace("/(?<!</li>)<li>/", '<ul><li>', $txt);

它工作正常，直到它不工作。例如，如果某些LI-块已经具有包装UL:)

我所有的DOMDocument方法都失败了，因为纯文本不被视为子节点。这意味着我能够找到LI并检查他们的兄弟姐妹是否LIs，然后在后一种情况适用的情况下将它们全部包装在一个UL中。但是，如果LI-groups 仅由一些没有任何HTML标记的文本分隔，则所有LI都被视为直系同级，没有任何分隔。

我不会使用正则表达式来解析 html(我们都看到了 SO 答案：-P (

因此，这是一个逐行分解文本的解决方案：

<?php
function isLi($line) {
return strstr($line, '<li');
}
$text = 'Some text
<strong>Some other text</strong>
<li>Element A1</li><li>Element A2</li>
<li>Element A3</li>
Text that separates group A from group B
<li>Element B1</li>
<li>Element B2</li> <li>Element B3</li>
<li>Element B4</li>
<strong>Element that separates group B from group C</strong>
<li>Element C1</li>
<li>Element C2</li>
Text can follow.
<li>Hello, nothing follows this</li>';
$array = explode("n", $text);
$html = '';
$previousWasLi = false;
foreach ($array as $line) {
if (empty($line)) {
continue;
}
if (isLi($line) && $previousWasLi == false) {
$html .= "<ul>n";
$html .= $line ."n";
$previousWasLi = true;
} elseif (isLi($line) && $previousWasLi == true) {
$html .= $line ."n";
$previousWasLi = true;
} elseif (!isLi($line) && $previousWasLi == true) {
$html .= "</ul>n";
$html .= $line ."n";
$previousWasLi = false;
} elseif (!isLi($line) && $previousWasLi == false) {
$html .= $line ."n";
}
}
// if the last line was an li, we need to close the ul
if ($previousWasLi) {
$html .= '</ul>';
}
echo $html;

这输出：

Some text 
<strong>Some other text</strong> 
<ul> 
<li>Element A1</li>
<li>Element A2</li> 
<li>Element A3</li> 
</ul> 
Text that separates group A from group B 
<ul> 
<li>Element B1</li>  
<li>Element B2</li> 
<li>Element B3</li> 
<li>Element B4</li> 
</ul> 
<strong>Element that separates group B from group C</strong> 
<ul> 
<li>Element C1</li> 
<li>Element C2</li> 
</ul> 
Text can follow. 
<ul> 
<li>Hello, nothing follows this</li> 
</ul>

您可以在此处看到它的工作 https://3v4l.org/kmfee

我能想到的最简单的解决方案是：

通过将每个<li>替换为<ul><li>并将每个</li>替换为</li></ul>，将每个<li>...</li>包装到<ul>...</ul>标签中。
删除所有后跟<ul></ul>忽略其间的所有空格和换行符。

代码应该像这样简单：

// first step
$txt = str_replace('<li>', '<ul><li>', $source_txt);
$txt = str_replace('</li>', '</li></ul>', $txt);
// second step
$txt = preg_replace('/</ul>s*<ul>/', '', $txt);

如果@Pilan在评论中提到<ul>已经包装了<li>，您可以添加第三步，删除<ul>后跟另一个<ul>，</ul>后跟另一个</ul>：

// third step
$txt = preg_replace('/<ul>s*<ul>/', '<ul>', $txt);
$txt = preg_replace('/</ul>s*</ul>/', '</ul>', $txt);

您可以使用以下代码在使用解决方案之前 1 步"几乎漂亮的格式"@delboy1978uk代码：

<?php
// $code_to_split is your code
$text = implode("n<li", explode('<li', implode("</li>n", explode('</li>', $code_to_split))));
function fnIsComplete($totest){
return (strpos(' '.$totest, '</li>')>0);
}
// use @delboy1978uk solution over $text
// add a param $iscomplete = false as 2° line
// inserting a validation rule to know if a line is <li ...  >  </li> complete
// add a test at } elseif (!isLi($line) && $previousWasLi == true) { block
} elseif (!isLi($line) && $previousWasLi == true) {
if($iscomplete ){
$html .= "</ul>n";
$html .= $line ."n";
$previousWasLi = false;
}elseif(fnIsComplete($line)) {
$html .= $line ."n";
$html .= "</ul>n";
$previousWasLi = false;
}else{
$html .= $line ."n";
}
}
// and when you set $previousWasLi = true; you set also $iscomplete
$previousWasLi = true; $iscomplete = fnIsComplete($line);

最好是将过程拆分为较小的步骤。

查找所有li标签
根据它们之间的文本对它们进行分组
注入ul标签

它为您提供了更大的灵活性，例如修复缺少的结束标签。

class LiFormatter{
public $html;
private $lis;
private $groups;
public function __construct($html){
$this->html = $html;
$this->lis = [];
$this->groups = [];
$this->findNextLi(0);
if(count($this->lis)==0)
return;
$this->determineGroups();
$this->wrap();
}
private function findNextLi($offset){
$html = $this->html;
$start_index = strpos($html,'<li>',$offset);
if($start_index===false)
return;
$end_index = strpos($html,'</li>',$start_index+4);
$next_index = strpos($html,'<li>',$start_index+4);
if($next_index!==false && $next_index<$end_index){
// handle missing closing tag
$this->insertAt('</li>',$next_index);
$end_index = $next_index;
}
$this->lis[] = ['start' => $start_index, 'end'=>$end_index+5];
$this->findNextLi($end_index);
}
private function determineGroups(){
while(count($this->lis)>0){
$last_li = array_shift($this->lis);
$group = [$last_li];
while(count($this->lis)>0){
$current_li = $this->lis[0];
$str_between = substr($this->html,$last_li['end'],$current_li['start']-$last_li['end']);
if($this->isSeperating($str_between)){
break;
}else{
$group[] = $current_li;
array_shift($this->lis);
$last_li = $current_li;
}
}
$this->groups[] = $group;
}
}
private function wrap(){
$offset = 0;
foreach ($this->groups as $group) {
$first_li = reset($group);
$last_li = end($group);
$group_start = $first_li['start'];
$group_end = $last_li['end'];
$this->insertAt('<ul>',$group_start + $offset);
$offset += 4;
$this->insertAt('</ul>',$group_end + $offset);
$offset += 5;
}
}
private function insertAt($str,$index){
$this->html = substr($this->html,0,$index) . $str . substr($this->html,$index);
}
private function isSeperating($str){
return preg_match("/w/", $str);
}
}

根据需要更改isSeparating()功能。它在每个li标记之间传递文本，如果分隔li则返回 true。目前，它检查是否有除空格以外的任何字符(换行符，制表符等(。

像这样使用它：

$output = (new LiFormatter($input))->html;

正则表达式？是的，请！

如果你愿意，你可以把它移植到PHP。仅用于 JS 中的演示目的。

var response = "Some text <strong>Some other text</strong><li>Element A1</li><li>Element A2</li><li>Element A3</li>Text that separates group A from group B<li>Element B1</li><li>Element B2</li> <li>Element B3</li><li>Element B4</li><strong>Element that separates group B from group C</strong><li>Element C1</li><li>Element C2</li>Text can follow.";
var r = response.replace(/(?<!</li>s*)<li>/g,'<ul><li>'); // <ul>
var r = r.replace(/</li>(?!s*<li>)/g,'</li></ul>'); // <ul>
$('#result').html(r);

<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<div id='result'></div>

相关内容

最新更新

热门标签：