文本文件到条件 PHP 数组



我有一个包含数千个条目的文件,我正在尝试将其转换为PHP数组,但是,我遇到了绊脚石,因为需要进入数组的内容是有条件的。好消息是数据是可预测的,有两种类型的条目 1( 撤销 2( 有理由撤销

#1 已撤销的条目示例

Serial Number: 0E76BE532946EFE890376F0339329A62
Revocation Date: Jun 27 14:46:26 2018 GMT

#2 的条目样本因故撤销

Serial Number: 0E17C9648FF25C0FC537D97958E4D449
Revocation Date: Jun 27 14:48:07 2018 GMT
CRL entry extensions:
X509v3 CRL Reason Code: 
Key Compromise

如果因故撤销,它总共有 5 行,否则只有 2 行。

数据文件数据示例.txt

以下是数千个条目列表中的数据示例,我们可以将其用作示例数据文件。

Serial Number: 0E76BE532946EFE890376F0339329A62
Revocation Date: Jun 27 14:46:26 2018 GMT
Serial Number: 0E17C9648FF25C0FC537D97958E4D449
Revocation Date: Jun 27 14:48:07 2018 GMT
CRL entry extensions:
X509v3 CRL Reason Code: 
Key Compromise
Serial Number: 06BB119BAA2ABC21F92B06ED8E14B113
Revocation Date: Jun 27 14:49:12 2018 GMT
CRL entry extensions:
X509v3 CRL Reason Code: 
Key Compromise
Serial Number: 088925C97AC5991CDF5416D07FC5DB00
Revocation Date: Jun 27 15:50:51 2018 GMT
Serial Number: 091E2B2090C7F5DBBCC97EA958B110BC
Revocation Date: Jun 27 15:52:31 2018 GMT
Serial Number: 0E6E9D1E9818221538EA6AF16A279C89
Revocation Date: Jun 27 15:53:12 2018 GMT
CRL entry extensions:
X509v3 CRL Reason Code: 
Key Compromise
Serial Number: 07852DF7D7DD35080DE3604836408ADE
Revocation Date: Jun 27 15:53:38 2018 GMT
Serial Number: 0DEA14237257A6A3049F934840DC2B47
Revocation Date: Jun 27 15:53:40 2018 GMT
CRL entry extensions:
X509v3 CRL Reason Code: 
Key Compromise

预期产出

我想使用以下输出构建一个数组

Array
(
[0] => Array
(
[serial] => 0E76BE532946EFE890376F0339329A62
[date] => Jun 27 14:46:26 2018 GMT
)
[1] => Array
(
[serial] => 0E17C9648FF25C0FC537D97958E4D449
[date] => Jun 27 14:48:07 2018 GMT
[reason] => Key Compromise
)
...
...
)

失败的尝试

这是我的尝试,只考虑了第一个条件 。对于,它有额外的行,但无法弄清楚如何考虑这些行。

$arr = array();
$lines = file('data.txt', FILE_IGNORE_NEW_LINES);
$x = 0;
foreach ($lines as $line) {
if (strpos($line, 'Serial Number: ') !== false) {
$arr[$x]['serial'] = str_replace('Serial Number: ', '', trim($line)) ;
}
if (strpos($line, 'Revocation Date: ') !== false) {
$arr[$x]['date'] = str_replace('Revocation Date: ', '', trim($line)) ;
$x++;
}
}

这是基于字符串操作的简单解决方案:

输入:

Serial Number: 0E76BE532946EFE890376F0339329A62
Revocation Date: Jun 27 14:46:26 2018 GMT
Serial Number: 0E17C9648FF25C0FC537D97958E4D449
Revocation Date: Jun 27 14:48:07 2018 GMT
CRL entry extensions:
X509v3 CRL Reason Code: 
Key Compromise
Serial Number: 06BB119BAA2ABC21F92B06ED8E14B113
Revocation Date: Jun 27 14:49:12 2018 GMT
CRL entry extensions:
X509v3 CRL Reason Code: 
Key Compromise
Serial Number: 088925C97AC5991CDF5416D07FC5DB00
Revocation Date: Jun 27 15:50:51 2018 GMT
Serial Number: 091E2B2090C7F5DBBCC97EA958B110BC
Revocation Date: Jun 27 15:52:31 2018 GMT
Serial Number: 0E6E9D1E9818221538EA6AF16A279C89
Revocation Date: Jun 27 15:53:12 2018 GMT
CRL entry extensions:
X509v3 CRL Reason Code: 
Key Compromise
Serial Number: 07852DF7D7DD35080DE3604836408ADE
Revocation Date: Jun 27 15:53:38 2018 GMT
Serial Number: 0DEA14237257A6A3049F934840DC2B47
Revocation Date: Jun 27 15:53:40 2018 GMT
CRL entry extensions:
X509v3 CRL Reason Code: 
Key Compromise

PHP代码:

<?php
// Extract the lines.
$file = file($filename, FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES);
//
$output = array();
foreach ($file as $row) {
if (strpos($row, "Serial Number") === false) {
$n = (count($output)-1);
if (strpos($row, "Revocation Date") !== false) {
$date = $row;
$date = str_replace('Revocation Date: ', ' ', $date);
$output[$n]['date'] = $date;
} else if (strpos($row, "CRL entry extensions") !== false) {
} else if (strpos($row, "X509v3 CRL Reason Code") !== false) {
} else {
$output[$n]['reason'] = $row;
}   
} else {
$sn = $row;
$sn = str_replace('Serial Number: ', ' ', $sn);
$output[] = array();
$n = (count($output)-1);
$output[$n]['serial'] = $sn;
$n++;
}   
echo $row.'</br>';
}
print_r($output);
?>

输出:

Array ( 
[0] => Array ( 
[serial] => 0E76BE532946EFE890376F0339329A62 
[date] => Jun 27 14:46:26 2018 GMT 
) 
[1] => Array ( 
[serial] => 0E17C9648FF25C0FC537D97958E4D449 
[date] => Jun 27 14:48:07 2018 GMT 
[reason] => Key Compromise 
) 
[2] => Array ( 
[serial] => 06BB119BAA2ABC21F92B06ED8E14B113 
[date] => Jun 27 14:49:12 2018 GMT 
[reason] => Key Compromise 
) 
[3] => Array ( 
[serial] => 088925C97AC5991CDF5416D07FC5DB00 
[date] => Jun 27 15:50:51 2018 GMT 
) 
[4] => Array ( 
[serial] => 091E2B2090C7F5DBBCC97EA958B110BC 
[date] => Jun 27 15:52:31 2018 GMT
) 
[5] => Array (
[serial] => 0E6E9D1E9818221538EA6AF16A279C89 
[date] => Jun 27 15:53:12 2018 GMT 
[reason] => Key Compromise
) 
[6] => Array ( 
[serial] => 07852DF7D7DD35080DE3604836408ADE 
[date] => Jun 27 15:53:38 2018 GMT
) 
[7] => Array (
[serial] => 0DEA14237257A6A3049F934840DC2B47 
[date] => Jun 27 15:53:40 2018 GMT 
[reason] => Key Compromise
)
)

根据您使用的文本文件的大小以及您对正则表达式的熟悉程度,您可以使用一种模式来提取您正在寻找的不同信息位。

我整理了一个简短的概念证明,适用于您提供的示例:

$re = '/W+Serial Number: (?<serial>.*?)$nW+Revocation Date: (?<date>.*?)$((?:(?!Serial Number)[n]*.)+Code: nW+(?<reason>.*?$))?/m';
$str = '    Serial Number: 0E76BE532946EFE890376F0339329A62
Revocation Date: Jun 27 14:46:26 2018 GMT
Serial Number: 0E17C9648FF25C0FC537D97958E4D449
Revocation Date: Jun 27 14:48:07 2018 GMT
CRL entry extensions:
X509v3 CRL Reason Code: 
Key Compromise
Serial Number: 06BB119BAA2ABC21F92B06ED8E14B113
Revocation Date: Jun 27 14:49:12 2018 GMT
CRL entry extensions:
X509v3 CRL Reason Code: 
Key Compromise
Serial Number: 088925C97AC5991CDF5416D07FC5DB00
Revocation Date: Jun 27 15:50:51 2018 GMT
Serial Number: 091E2B2090C7F5DBBCC97EA958B110BC
Revocation Date: Jun 27 15:52:31 2018 GMT
Serial Number: 0E6E9D1E9818221538EA6AF16A279C89
Revocation Date: Jun 27 15:53:12 2018 GMT
CRL entry extensions:
X509v3 CRL Reason Code: 
Key Compromise
Serial Number: 07852DF7D7DD35080DE3604836408ADE
Revocation Date: Jun 27 15:53:38 2018 GMT
Serial Number: 0DEA14237257A6A3049F934840DC2B47
Revocation Date: Jun 27 15:53:40 2018 GMT
CRL entry extensions:
X509v3 CRL Reason Code: 
Key Compromise';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
// Print the entire match result
var_dump($matches);

您可以在此处查看此示例的实际操作:https://regex101.com/r/7iSBrx/1。

此示例使用命名组来帮助从匹配项中提取所需的目标,还有助于说明目标捕获在模式中发生的位置。如果有帮助,我很乐意分解为什么这种模式有效。

需要注意的是,这将需要将整个文件加载到单个字符串中,如果文件很大,则可能会占用大量内存。基于迭代的方法最适合非常大的文件。

试试这段代码:

$file_handle = fopen("data.txt", "rb");
while (!feof($file_handle) ) {
$line_of_text = fgets($file_handle);
$parts = explode('=', $line_of_text);

$name =array($line_of_text);
print_r($name);
}
fclose($file_handle);

最新更新