分析包含多个没有分隔字符的条目的格式严格的文本

我有一个字符串，其中包含多个产品订单，这些订单已在没有分隔符的情况下连接在一起。

我需要解析输入字符串，并将三个子字符串的集合转换为单独的数据行。

我尝试使用split()和strstr()函数拆分字符串，但无法生成所需的结果。

如何将此语句转换为不同的列？

RM是马来西亚林吉特

根据该声明：

"2 x Brew Coffeee Panas: RM7.42 x Tongkat Ali Ais: RM8.6"

进入单独的行：

2 x煮咖啡Panas:RM7.4
2个通卡特阿里Ais:RM8.6

并且这2行进入DB:中的该表

表：产品

7.4

产品名称		数量	总金额(RM(
酿造咖啡Panas	2
Tongkat Ali Ais	2		8.6

如果字符串格式一致，则可以使用regex。这里有一个可以做到这一点的表达式：

(d) x (.+?): RM(d+.d)

的基本用途

$re = '/(d) x (.+?): RM(d+.d)/';
$str = '2 x Brew Coffeee Panas: RM7.42 x Tongkat Ali Ais: RM8.6';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
var_export($matches);

这就产生了

array (
0 => 
array (
0 => '2 x Brew Coffeee Panas: RM7.4',
1 => '2',
2 => 'Brew Coffeee Panas',
3 => '7.4',
),
1 => 
array (
0 => '2 x Tongkat Ali Ais: RM8.6',
1 => '2',
2 => 'Tongkat Ali Ais',
3 => '8.6',
),
)

组0将始终是完全匹配的，之后组将是数量、产品和价格。

在线试用

捕获一个或多个数字
匹配空格，x，空格
捕获一个或多个非冒号字符，直到第一个出现冒号为止
匹配冒号、空格，然后匹配RM
捕获最大小数长度为1的浮点值
_{OP在问题注释中说：it only take one decimal place for the amount}

；懒惰的量词"；在我的模式中，所以regex可以移动得最快。

此正则表达式模式与示例数据和需求解释所允许的Accurate一样，与Efficient因为它只包含贪婪的量词一样，与Concise由于否定的字符类而一样，与Readable是因为没有多余的字符而可以生成的模式一样。

代码：(演示(

var_export(
preg_match_all('~(d+) x ([^:]+): RM(d+.d)~', $string, $m)
? array_slice($m, 1)  // omit the fullstring matches
: []                  // if there are no matches
);

输出：

array (
0 => 
array (
0 => '2',
1 => '2',
),
1 => 
array (
0 => 'Brew Coffeee Panas',
1 => 'Tongkat Ali Ais',
),
2 => 
array (
0 => '7.4',
1 => '8.6',
),
)

您可以将PREG_SET_ORDER参数添加到preg_match_all()调用中，以帮助将匹配项迭代为行。

preg_match_all('~(d+) x ([^:]+): RM(d+.d)~', $string, $matches, PREG_SET_ORDER);
foreach ($matches as $match) {
echo '<tr><td>' . implode('</td><td>', array_slice($match, 1)) . '</td></tr>';
}

您可以使用这样的正则表达式：

/(d+)sxs([^:]+):sRM(d+.?d?)(?=d|$)/

说明：

(d+)捕获一个或多个数字
s匹配空白字符
([^:]+):捕获:字符之前的一个或多个非:字符(如果您确切知道:字符之前可以存在哪些字符，也可以使用类似[a-zA-Z0-9s]+):的字符，在本例中为小写和大写字母、数字0到9以及空白字符(
(d+.?d?)捕获一个或多个数字，然后是.和另一个数字(如果存在(
(?=d|$)是一个正向前瞻，它匹配主表达式后面的数字，而不将其包含在结果或字符串末尾

您还可以将PREG_SET_ORDER标志添加到preg_match_all((中以对结果进行分组：

PREG_SET_ORDER
排序结果使得$matches[0]是第一组匹配的数组，$matches[1]是第二组匹配的阵列，依此类推

代码示例：

<?php
$txt = "2 x Brew Coffeee Panas: RM7.42 x Tongkat Ali Ais: RM8.62 x B026 Kopi Hainan Kecil: RM312 x B006 Kopi Hainan Besar: RM19.5";
$pattern = "/(d+)sxs([^:]+):sRM(d+.?d?)(?=d|$)/";
if(preg_match_all($pattern, $txt, $matches, PREG_SET_ORDER)) {
print_r($matches);
}
?>

输出：

Array
(
[0] => Array
(
[0] => 2 x Brew Coffeee Panas: RM7.4
[1] => 2
[2] => Brew Coffeee Panas
[3] => 7.4
)
[1] => Array
(
[0] => 2 x Tongkat Ali Ais: RM8.6
[1] => 2
[2] => Tongkat Ali Ais
[3] => 8.6
)
[2] => Array
(
[0] => 2 x B026 Kopi Hainan Kecil: RM31
[1] => 2
[2] => B026 Kopi Hainan Kecil
[3] => 31
)
[3] => Array
(
[0] => 2 x B006 Kopi Hainan Besar: RM19.5
[1] => 2
[2] => B006 Kopi Hainan Besar
[3] => 19.5
)
)

请在这里查看php实时编辑器和regex测试程序。

我要做的第一件事是使用preg_replace执行一个简单的替换，在back-reference的帮助下，根据单个小数点的已知格式，插入到捕获的项目中。任何超过这个小数点的东西都是下一个项目的一部分——在这种情况下是数量。

$str="2 x Brew Coffeee Panas: RM7.42 x Tongkat Ali Ais: RM8.625 x Koala Kebabs: RM15.23 x Fried Squirrel Fritters: RM32.4";
#   qty price
#   2   7.4
#   2   8.6
#   25  15.2
#   3   32.4

/*
Our RegEx to find the decimal precision, 
to split the string apart and the quantity
*/
$pttns=(object)array(
'repchar'   =>  '@(RMd{1,}.d{1})@',
'splitter'  =>  '@(|)@',
'combo' =>  '@^((d{1,}) x)(.*): RM(d{1,}.d{1})$@'
);
# create a new version of the string with our specified delimiter - the PIPE
$str = preg_replace( $pttns->repchar, '$1|', $str );
# split the string intp pieces - discard empty items
$a=array_filter( preg_split( $pttns->splitter, $str, null ) );
#iterate through matches - find the quantity,item & price
foreach($a as $str){
preg_match($pttns->combo,$str,$matches);
$qty=$matches[2];
$item=$matches[3];
$price=$matches[4];

printf('%s %d %d<br />',$item,$qty,$price);
}

哪个收益率：

Brew Coffeee Panas 2 7
Tongkat Ali Ais 2 8
Koala Kebabs 25 15
Fried Squirrel Fritters 3 32

相关内容

最新更新

热门标签：