证书PEM文件包含开头和结尾,如:
-----BEGIN CERTIFICATE----- [Base64 of certificate] -----END CERTIFICATE
我只需要提取[Base64 of certificate]
部分,即去掉"-----BEGIN CERTIFICATE----- "
和" -----END CERTIFICATE-----"
字符串,我想知道这是否可以用正则表达式来完成,如果可以,正则表达式会是什么样子?
我试过:
-----BEGIN CERTIFICATE----- (.*) -----END CERTIFICATE-----
但是,它并没有给我证书的Base64,而是返回了所有内容。
谢谢,Jim
由于我不知道你使用的语言,我给你一个相对可移植的模式(必须支持先行和后向):
(?<=-----BEGIN CERTIFICATE----- )(?:S+|s(?!-----END CERTIFICATE-----))+(?=s-----END CERTIFICATE-----)
结果是整个模式,因为环视只是检查。
在perl:中
my base64_cert_data;
if ($certbuf=~/(-+BEGIN CERTIFICATE-+)(.*?)(-+END CERTIFICATE-+)/s) {
base64_cert_data = $2;
}
Regex解释:
/(-+BEGIN CERTIFICATE-+)(.*?)(-+END CERTIFICATE-+)/s
1st Capturing group (-+BEGIN CERTIFICATE-+)
-+ matches the character - literally
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
BEGIN CERTIFICATE matches the characters BEGIN CERTIFICATE literally (case sensitive)
-+ matches the character - literally
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
2nd Capturing group (.*?)
.*? matches any character
Quantifier: *? Between zero and unlimited times, as few times as possible, expanding as needed [lazy]
3rd Capturing group (-+END CERTIFICATE-+)
-+ matches the character - literally
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
END CERTIFICATE matches the characters END CERTIFICATE literally (case sensitive)
-+ matches the character - literally
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
s modifier: single line. Dot matches newline characters
Hi下面是一个示例perl代码来支持您的需求。
my $Str = "-----BEGIN CERTIFICATE-----
MIIBuTCCASKgAwIBAgIQNdNhtuV5GbNHYZsf+LvM0zANBgkqhkiG9w0BAQUFADAb
MRkwFwYDVQQDExBFZGlkZXYgU21va2VUZXN0MB4XDTA4MTExMjE5NTEzNVoXDTM5
MTIzMTIzNTk1OVowGzEZMBcGA1UEAxMQRWRpZGV2IFNtb2tlVGVzdDCBnzANBgkq
hkiG9w0BAQEFAAOBjQAwgYkCgYEAm6zGzqxejwswWTNLcSsa7P8xqODspX9VQBuq
5W1RoTgQ0LNR64+7ywLjH8+wrb/lB6QV7s2SFUiWDeduVesvMJkWtZ5zzQyl3iUa
CBpT4S5AaO3/wkYQSKdI108pXH7Aue0e/ZOwgEEX1N6OaPQn7AmAB4uq1h+ffw+r
RKNHqnsCAwEAATANBgkqhkiG9w0BAQUFAAOBgQCZmj+pgRsN6HpoICawK3XXNAmi
cgfQkailX9akIjD3xSCwEQx4nG6tZjTz30u4NoSffW7pch58SxuZQDqW5NsJcQNq
Ngo/dMoqqpXdi2/0BYEcJ8pjsngrFm+fM2BnyGpXH7aWuKsWjVFGlWlF+yi8I35Q
8wFJt2Z/XGA7WWDjvw==
-----END CERTIFICATE-----";
if($Str =~ /^W+w+s+w+W+s(.*)s+W+.*$/s) {
print "$1" . "nn";
} else {
print "Non" . "nn";
}
输出:
MIIBuTCCASKgAwIBAgIQNdNhtuV5GbNHYZsf+LvM0zANBgkqhkiG9w0BAQUFADAb
MRkwFwYDVQQDExBFZGlkZXYgU21va2VUZXN0MB4XDTA4MTExMjE5NTEzNVoXDTM5
MTIzMTIzNTk1OVowGzEZMBcGA1UEAxMQRWRpZGV2IFNtb2tlVGVzdDCBnzANBgkq
hkiG9w0BAQEFAAOBjQAwgYkCgYEAm6zGzqxejwswWTNLcSsa7P8xqODspX9VQBuq
5W1RoTgQ0LNR64+7ywLjH8+wrb/lB6QV7s2SFUiWDeduVesvMJkWtZ5zzQyl3iUa
CBpT4S5AaO3/wkYQSKdI108pXH7Aue0e/ZOwgEEX1N6OaPQn7AmAB4uq1h+ffw+r
RKNHqnsCAwEAATANBgkqhkiG9w0BAQUFAAOBgQCZmj+pgRsN6HpoICawK3XXNAmi
cgfQkailX9akIjD3xSCwEQx4nG6tZjTz30u4NoSffW7pch58SxuZQDqW5NsJcQNq
Ngo/dMoqqpXdi2/0BYEcJ8pjsngrFm+fM2BnyGpXH7aWuKsWjVFGlWlF+yi8I35Q
8wFJt2Z/XGA7WWDjvw==
base64字符集为:
[A-Za-z0-9+/rn]+={0,2}
-这是对PEM文件(base64证书)通常使用的内容的准确描述。=
用于填充(在末尾),rn
是换行符。
综合起来我们得到:
"-+BEGIN\s+.*CERTIFICATE[^-]*-+(?:\s|\r|\n)+" // Header
+ "([A-Za-z0-9+/rn]+={0,2})" // Base64 text
+ "-+END\s+.*CERTIFICATE[^-]*-+" // Footer
如果你想成为语言不可知论者,你可以期望页眉/页脚是一个或多个-
字符,后面跟着大写字母,再后面跟着一个或更多个-
字符。
这里是正则表达式:
(?<=-----BEGIN CERTIFICATE-----)[sS]*?(?=-----END CERTIFICATE-----)
Java示例:
final String text = Files.readString("file.txt") // file with certs ;
final List<String> results = new ArrayList<>();
final Matcher matcher = Pattern.compile("(?<=-----BEGIN CERTIFICATE-----)[\s\S]*?(?=-----END CERTIFICATE-----)")
.matcher(text);
while (matcher.find()) {
final String m = matcher.group()
.replaceAll("\n", "")
.replaceAll("\r", "");
results.add(m);
}