使用RegExp从Whois中提取信息



如何从Whois查找的结果中提取多个段?

我得到一个数组,它的结果来自Whois查找(来自foreach循环(。

例如,如果我想要从WHOIS数据库的"domain…."行到">>>Last update"的所有内容:-行。我该怎么做?

Whois使用exec命令执行:

foreach ($query as $domain) {               
$scanUrl = 'whois '.$domain->url;
exec($scanUrl, $output);             
}

Whois工作没有问题,我可以用preg_rep:获得创建、过期和注册器

$domainCreated  = preg_grep('/created/', $output);
$domainExpires  = preg_grep('/expires/', $output);
$domainRegistrar  = preg_grep('/registrar..........:/', $output);

但我需要从数组中获得多个片段,例如从域中。。。。行到>>WHOIS数据库的上次更新:-行。

所有Whois结果都在一个数组中。Whois的结果如下:

Array
(
[0] =>
[1] => domain.............: iltalehti.fi
[2] => status.............: Registered
[3] => created............: 1.1.1991 00:00:00
[4] => expires............: 31.8.2022 00:00:00
[5] => available..........: 30.9.2022 00:00:00
[6] => modified...........: 6.9.2017
[7] => holder transfer....: 13.7.2013
[8] => RegistryLock.......: no
[9] =>
[10] => Nameservers
[11] =>
[12] => nserver............: a.ns-sec.com [Technical Error]
[13] => nserver............: d.ns-sec.org [OK]
[14] => nserver............: c.ns-sec.fi [178.217.128.53] 
[2001:67c:224:53::53:1] [OK]
[15] => nserver............: b.ns-sec.net [OK]
[16] =>
[17] => DNSSEC
[18] =>
[19] => dnssec.............: no
[20] =>
[21] => Holder
[22] =>
[23] => name...............: Alma Media Oyj
[24] => register number....: 1944757-4
[25] => address............: PL 140
[26] => address............: 00101
[27] => address............: Helsinki
[28] => country............: Finland
[29] => phone..............: +358 10 665 000
[30] => holder email.......:
[31] =>
[32] => Registrar
[33] =>
[34] => registrar..........: Cybercom Finland Oy
[35] => www................: www.cybercom.com
[36] =>
[37] => >>> Last update of WHOIS database: 24.3.2020 12:45:05 (EET) <<<
[38] =>
[39] =>
[40] => Copyright (c) Finnish Transport and Communications Agency Traficom
[41] =>
[42] =>
[43] => domain.............: yle.fi
[44] => status.............: Registered
[45] => created............: 1.1.1991 00:00:00
[46] => expires............: 31.8.2020 00:00:00
[47] => available..........: 30.9.2020 00:00:00
[48] => modified...........: 16.1.2018
[49] => RegistryLock.......: no
[50] =>
[51] => Nameservers
[52] =>
[53] => nserver............: ns-997.awsdns-60.net [OK]
[54] => nserver............: ns-1394.awsdns-46.org [OK]
[55] => nserver............: ns-1882.awsdns-43.co.uk [OK]
[56] => nserver............: ns-76.awsdns-09.com [OK]
[57] =>
[58] => DNSSEC
[59] =>
[60] => dnssec.............: no
[61] =>
[62] => Holder
[63] =>
[64] => name...............: Yleisradio Oy
[65] => register number....: 0215438-8
[66] => address............: Radiokatu 5
[67] => address............: 00024
[68] => address............: Yleisradio
[69] => country............: Finland
[70] => phone..............: +358914801
[71] => holder email.......:
[72] =>
[73] => Registrar
[74] =>
[75] => registrar..........: Yleisradio Oy
[76] =>
[77] => >>> Last update of WHOIS database: 24.3.2020 12:45:12 (EET) <<<
[78] =>
[79] =>
[80] => Copyright (c) Finnish Transport and Communications Agency Traficom
[81] =>
[82] =>
[83] => domain.............: is.fi
[84] => status.............: Registered
[85] => created............: 12.9.2016 10:01:17
[86] => expires............: 12.9.2020 10:01:17
[87] => available..........: 12.10.2020 10:01:17
[88] => modified...........: 17.9.2017
[89] => holder transfer....: 3.2.2017
[90] => RegistryLock.......: no
[91] =>
[92] => Nameservers
[93] =>
[94] => nserver............: ns-2017.awsdns-60.co.uk [OK]
[95] => nserver............: ns-824.awsdns-39.net [OK]
[96] => nserver............: ns-111.awsdns-13.com [OK]
[97] => nserver............: ns-1159.awsdns-16.org [OK]
[98] =>
[99] => DNSSEC
[100] =>
[101] => dnssec.............: no
[102] =>
[103] => Holder
[104] =>
[105] => name...............: Sanoma Media Finland Oy
[106] => register number....: 1515901-4
[107] => address............: Töölönlahdenkatu 2
[108] => address............: 00100
[109] => address............: Helsinki
[110] => country............: Finland
[111] => phone..............: +35891221
[112] => holder email.......:
[113] =>
[114] => Registrar
[115] =>
[116] => registrar..........: Sanoma Oyj
[117] =>
[118] => >>> Last update of WHOIS database: 24.3.2020 12:46:59 (EET) <<<
[119] =>
[120] =>
[121] => Copyright (c) Finnish Transport and Communications Agency Traficom
[122] =>
[123] =>
[124] => domain.............: hs.fi
[125] => status.............: Registered
[126] => created............: 10.7.2009 00:00:00
[127] => expires............: 14.7.2020 11:17:58
[128] => available..........: 14.8.2020 11:17:58
[129] => modified...........: 7.9.2017
[130] => RegistryLock.......: no
[131] =>
[132] => Nameservers
[133] =>
[134] => nserver............: ns-83.awsdns-10.com [OK]
[135] => nserver............: ns-1635.awsdns-12.co.uk [OK]
[136] => nserver............: ns-1461.awsdns-54.org [OK]
[137] => nserver............: ns-678.awsdns-20.net [OK]
[138] =>
[139] => DNSSEC
[140] =>
[141] => dnssec.............: no
[142] =>
[143] => Holder
[144] =>
[145] => name...............: Sanoma Media Finland Oy / Helsingin Sanomat
[146] => register number....: 1515901-4
[147] => address............: Töölönlahdenkatu 2
[148] => address............: 00100
[149] => address............: Helsinki
[150] => country............: Finland
[151] => phone..............: +35891221
[152] => holder email.......:
[153] =>
[154] => Registrar
[155] =>
[156] => registrar..........: Sanoma Oyj
[157] =>
[158] => >>> Last update of WHOIS database: 24.3.2020 12:45:20 (EET) <<<
[159] =>
[160] =>
[161] => Copyright (c) Finnish Transport and Communications Agency Traficom
[162] =>
)

我试过这样的东西:

$domainRawScan = preg_grep('/bdomainb.*b>>> Last update of WHOIS database:b/', $output);

但我对使用RegExp非常陌生,发现语法相当混乱。如有任何帮助,我们将不胜感激。

一种方法是获取exec命令返回的$output数组,并将其返回为单个字符串:

$text = implode("n", $output)

然后使用preg_match_all获取所有关键字和值

preg_match_all('/^(.*?)\.*: (.+)/m', $text, $matches);

$matches[1][n]将具有关键字n,而$matches[2][n]将具有值n

Regex Demo

^             # Start of line in multiline mode
(             # Start of capture group 1
.*?        # Match 0 or more characters until ...
)             # End of capture group 1
.*           # Match 0 or more periods
:             # Match a colon followed by a space
(             # Start of capture group 2
.+         # Match 1 or more characters up to but not including a newline
)             # End of capture group 2

更新

每次通过循环,您将处理一个域和关键字/值对。你将如何处理这些取决于你自己。

foreach ($query as $domain) {
$scanUrl = 'whois '. $domain->url;
$output = []; // start with an empty array
exec($scanUrl, $output);
$text = implode("n", $output);
preg_match_all('/^(.*?)\.*: (.+)/m', $text, $matches);
$n = count($matches[1]); // number of keyword/value pairs
for ($i = 0; $i < $n; $i++) {
// display next keyword/value pair:
echo $matches[1][$i], "->", $matches[2][$i], "n";
}
}

更新2

与其将exec命令返回的行数组连接到一个字符串中并执行preg_match_all(这将为您提供一个匹配数组(,不如对exec命令的各个输出行执行单独的preg_match调用:

foreach ($query as $domain) {
$scanUrl = 'whois '. $domain->url;
$output = []; // start with an empty array
exec($scanUrl, $output);
foreach ($output as $line) {
if (preg_match('/^(.*?)\.*: (.+)/', $line, $matches)) {
echo $matches[1], "->", $matches[2], "n";
}
}    
}

最新更新