php:将字符串拆分为关联数组的更好方法



我有一个这样的字符串:

"ALARM_ID/I4=1010001 ALARM_STATE/U4=eventcode ALARM_TEXT/A=WMR_MAP_EXPORT LOTS/A[1]=[ STEFANO ] ALARM_STATE/U1=1 WAFER/U4=1 VI_KLARF_MAP/A=/test/klarf.map KLARF_STEPID/A=StepID KLARF_DEVICEID/A=DeviceID KLARF_EQUIPMENTID/A=EquipmentID KLARF_SETUP_ID/A=SetupID RULE_ID/U4=1234 RULE_FORMULA_EXPRESSION/A=a < b && c > d RULE_FORMULA_TEXT/A=1 < 0 && 2 > 3 RULE_FORMULA_RESULT/A=FAIL TIMESTAMP/A=10-Nov-2020 09:10:11 99999999"

我的目标是分裂成一个关联数组:

Array
(
[ALARM_ID/I4] => 1010001
[ALARM_STATE/U4] => eventcode
[ALARM_TEXT/A] => WMR_MAP_EXPORT
[LOTS/A[1]] => [ STEFANO ]
[ALARM_STATE/U1] => 1
[WAFER/U4] => 1
[VI_KLARF_MAP/A] => /test/klarf.map
[KLARF_STEPID/A] => StepID
[KLARF_DEVICEID/A] => DeviceID
[KLARF_EQUIPMENTID/A] => EquipmentID
[KLARF_SETUP_ID/A] => SetupID
[RULE_ID/U4] => 1234
[RULE_FORMULA_EXPRESSION/A] => a < b && c > d
[RULE_FORMULA_TEXT/A] => 1 < 0 && 2 > 3
[RULE_FORMULA_RESULT/A] => FAIL
[TIMESTAMP/A] => 10-Nov-2020 09:10:11 99999999
)

我发现的唯一(但可能更脏(的方式是通过这个脚本:

<?php
$msg = "ALARM_ID/I4=1010001 ALARM_STATE/U4=eventcode ALARM_TEXT/A=WMR_MAP_EXPORT LOTS/A[1]=[ STEFANO ] ALARM_STATE/U1=1 WAFER/U4=1 VI_KLARF_MAP/A=/test/klarf.map KLARF_STEPID/A=StepID KLARF_DEVICEID/A=DeviceID KLARF_EQUIPMENTID/A=EquipmentID KLARF_SETUP_ID/A=SetupID RULE_ID/U4=1234 RULE_FORMULA_EXPRESSION/A=a < b && c > d RULE_FORMULA_TEXT/A=1 < 0 && 2 > 3 RULE_FORMULA_RESULT/A=FAIL TIMESTAMP/A=10-Nov-2020 09:10:11 99999999";
$split = explode("=", $msg);
foreach($split as $k => $s) {
$s = explode(" ", $s);
$keys[] = array_pop($s);
if ($s) $values[] = implode(" ", $s);
}
/*
* this is needed if last parameter TIMESTAMP does not have ' ' (spaces) into value
*/
if (count($values) + 2 == count($keys)) array_push($values, array_pop($keys));
else                                    $values[ count($values) - 1 ] .= " " . array_pop($keys);
$params = array_combine($keys, $values);
print_r($params);
?>

你认为有更好的方法来分割它吗?也许可以使用正则表达式或不同的(优雅的?(方法?

在保持准确性方面要做的重要事情是确保"键";正确匹配。

键字符串永远不会包含空格或等号。值字符串可以包含其中任何一个。值字符串将运行到字符串的末尾,或者后面跟着一个空格,然后是下一个键(可能没有任何空格或等号(。

密钥串可以是";贪婪地";在第一个遇到的CCD_ 1出现之前匹配。

值字符串不能贪婪地匹配。这样可以确保该值不会过度扩展到下一个键值对中。

值字符串之后的前瞻性确保潜在的后续键不会被损坏/消耗。

模式分解:

([^=]+)      #capture one ore more non-equals sign (greedily) and store as capture group #1
=            #match but do not capture an equals sign
(.+?)        #capture one or more of any non-newline character (giving back when possible / non-greedy) and store as capture group #2
(?=          #start lookahead
$          #match the end of the string
|          #OR operator
[^ =]+=   #match space, then one or more non-space and non-equals characters, then match equals sign
)            #end lookahead

代码:(演示(

$msg = "ALARM_ID/I4=1010001 ALARM_STATE/U4=eventcode ALARM_TEXT/A=WMR_MAP_EXPORT LOTS/A[1]=[ STEFANO ] ALARM_STATE/U1=1 WAFER/U4=1 VI_KLARF_MAP/A=/test/klarf.map KLARF_STEPID/A=StepID KLARF_DEVICEID/A=DeviceID KLARF_EQUIPMENTID/A=EquipmentID KLARF_SETUP_ID/A=SetupID RULE_ID/U4=1234 RULE_FORMULA_EXPRESSION/A=a < b && c > d RULE_FORMULA_TEXT/A=1 < 0 && 2 > 3 RULE_FORMULA_RESULT/A=FAIL TIMESTAMP/A=10-Nov-2020 09:10:11 99999999";
preg_match_all('~([^=]+)=(.+?)(?=$| [^ =]+=)~', $msg, $out);
var_export(array_combine($out[1], $out[2]));

输出:

array (
'ALARM_ID/I4' => '1010001',
'ALARM_STATE/U4' => 'eventcode',
'ALARM_TEXT/A' => 'WMR_MAP_EXPORT',
'LOTS/A[1]' => '[ STEFANO ]',
'ALARM_STATE/U1' => '1',
'WAFER/U4' => '1',
'VI_KLARF_MAP/A' => '/test/klarf.map',
'KLARF_STEPID/A' => 'StepID',
'KLARF_DEVICEID/A' => 'DeviceID',
'KLARF_EQUIPMENTID/A' => 'EquipmentID',
'KLARF_SETUP_ID/A' => 'SetupID',
'RULE_ID/U4' => '1234',
'RULE_FORMULA_EXPRESSION/A' => 'a < b && c > d',
'RULE_FORMULA_TEXT/A' => '1 < 0 && 2 > 3',
'RULE_FORMULA_RESULT/A' => 'FAIL',
'TIMESTAMP/A' => '10-Nov-2020 09:10:11 99999999',
)

您可以利用所有密钥中存在的/

([^s=/]+/[^s=]+)=(.*?)(?=h+[^s=/]+/|$)

解释

  • (捕获组1
    • [^s=/]+将除空白=/之外的任何字符匹配0+次
    • /[^s=]+然后匹配/,然后匹配密钥的其余部分
  • )关闭组1
  • =0按字面匹配
  • (.*?)捕获组2,尽可能少地匹配换行符以外的任何字符
  • (?=h+[^s=/]+/|$)断言包含/的类密钥格式(如组1中所用(

查看Regex演示和Php演示。

示例代码

$re = '`([^s=/]+/[^s=]+)=(.*?)(?=h+[^s=/]+/|$)`';
$str = 'ALARM_ID/I4=1010001 ALARM_STATE/U4=eventcode ALARM_TEXT/A=WMR_MAP_EXPORT LOTS/A[1]=[ STEFANO ] ALARM_STATE/U1=1 WAFER/U4=1 VI_KLARF_MAP/A=/test/klarf.map KLARF_STEPID/A=StepID KLARF_DEVICEID/A=DeviceID KLARF_EQUIPMENTID/A=EquipmentID KLARF_SETUP_ID/A=SetupID RULE_ID/U4=1234 RULE_FORMULA_EXPRESSION/A=a < b && c > d RULE_FORMULA_TEXT/A=1 < 0 && 2 > 3 RULE_FORMULA_RESULT/A=FAIL TIMESTAMP/A=10-Nov-2020 09:10:11 99999999
';
preg_match_all($re, $str, $matches);
$result = array_combine($matches[1], $matches[2]);
print_r($result);

输出

Array
(
[ALARM_ID/I4] => 1010001
[ALARM_STATE/U4] => eventcode
[ALARM_TEXT/A] => WMR_MAP_EXPORT
[LOTS/A[1]] => [ STEFANO ]
[ALARM_STATE/U1] => 1
[WAFER/U4] => 1
[VI_KLARF_MAP/A] => /test/klarf.map
[KLARF_STEPID/A] => StepID
[KLARF_DEVICEID/A] => DeviceID
[KLARF_EQUIPMENTID/A] => EquipmentID
[KLARF_SETUP_ID/A] => SetupID
[RULE_ID/U4] => 1234
[RULE_FORMULA_EXPRESSION/A] => a < b && c > d
[RULE_FORMULA_TEXT/A] => 1 < 0 && 2 > 3
[RULE_FORMULA_RESULT/A] => FAIL
[TIMESTAMP/A] => 10-Nov-2020 09:10:11 99999999
)

如果键都以下划线分隔的单词字符开头,则可以使用重复部分[^W_]+(?:_[^W_]+)*启动模式

它将匹配除_之外的单词字符,然后重复匹配_,然后匹配除_之外的单词字母,直到达到/

([^W_]+(?:_[^W_]+)*/[^s=]*)=(.*?)(?=h+[^s=/]+/|$)

Regex演示

我使用基本的PHP函数管理了这段代码。我认为正则表达式会使代码更难阅读。大多数时候,即使以拥有更详细的代码为代价,也最好不要使用正则表达式。这也可能对性能产生影响。

$message = "ALARM_ID/I4=1010001 ALARM_STATE/U4=eventcode ALARM_TEXT/A=WMR_MAP_EXPORT LOTS/A[1]=[ STEFANO ] ALARM_STATE/U1=1 WAFER/U4=1 VI_KLARF_MAP/A=/test/klarf.map KLARF_STEPID/A=StepID KLARF_DEVICEID/A=DeviceID KLARF_EQUIPMENTID/A=EquipmentID KLARF_SETUP_ID/A=SetupID RULE_ID/U4=1234 RULE_FORMULA_EXPRESSION/A=a < b && c > d RULE_FORMULA_TEXT/A=1 < 0 && 2 > 3 RULE_FORMULA_RESULT/A=FAIL TIMESTAMP/A=10-Nov-2020 09:10:11 99999999";
foreach (explode(' ', $message) as $word) {
if (strpos($word, '=')) {
if (isset($key)) $parameters[$key] = $value; 
list($key, $value) = explode('=', $word);
}
else $value .= " $word";
}    
$parameters[$key] = $value;     
echo '<pre>';
print_r($parameters);
echo '</pre>';

我选择在空格上拆分,然后查找=字符,以找到其中包含关键字的单词。

当然,还有其他方法可以做到这一点,但由于消息的格式奇怪,所有这些都需要一些额外的工作。

这个例程目前不容忍消息字符串中的错误,但可以很容易地扩展它以容忍各种类型的输入错误。

最新更新