有人可以帮助我在PHP中创建一个正则表达式来解析出Akamai访问日志中的不同字段。下面的第一行指定字段名。谢谢!
#Fields: date time cs-ip cs-method cs-uri sc-status sc-bytes time-taken cs(Referer) cs(User-Agent) cs(Cookie) x-custom
2011-08-08 23:59:52 63.555.254.85 GET /somedomain/images/banner_320x50.jpg 200 10801 0 "http://somerefered.com" "Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_3_1 like Mac OS X; en-us) AppleWebKit/533.17.9 (KHTML, like Gecko) Mobile/8G4" "-" "-"
下面是我刚刚编写的一个简短的测试程序:
<?php
// Fields: date time cs-ip cs-method cs-uri sc-status sc-bytes time-taken cs(Referer) cs(User-Agent) cs(Cookie) x-custom
$logLine = '2011-08-08 23:59:52 63.555.254.85 GET /somedomain/images/banner_320x50.jpg 200 10801 0 "http://somerefered.com" "Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_3_1 like Mac OS X; en-us) AppleWebKit/533.17.9 (KHTML, like Gecko) Mobile/8G4" "-" "-"';
$regex = '/^(d{4}-d{2}-d{2})s+(d{2}:d{2}:d{2})s+(d{1,3}(?:.d{1,3}){3})s+([A-Za-z]+)s+(S+)s+(d{3})s+(d+)s+(d+)s+"([^"]*)"s+"([^"]*)"s+"([^"]*)"s+"([^"]*)"$/';
$matches = array();
if (preg_match($regex, $logLine, $matches)) {
$logParts = array(
'date' => $matches[1],
'time' => $matches[2],
'cs-ip' => $matches[3],
'cs-method' => $matches[4],
'cs-uri' => $matches[5],
'sc-status' => $matches[6],
'sc-bytes' => $matches[7],
'time-taken' => $matches[8],
'cs(Referer)' => $matches[9],
'cs(User-Agent)' => $matches[10],
'cs(Cookie)' => $matches[11],
'x-custom' => $matches[12]
);
print_r($logParts);
}
?>
这个输出:Array
(
[date] => 2011-08-08
[time] => 23:59:52
[cs-ip] => 63.555.254.85
[cs-method] => GET
[cs-uri] => /somedomain/images/banner_320x50.jpg
[sc-status] => 200
[sc-bytes] => 10801
[time-taken] => 0
[cs(Referer)] => http://somerefered.com
[cs(User-Agent)] => Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_3_1 like Mac OS X; en-us) AppleWebKit/533.17.9 (KHTML, like Gecko) Mobile/8G4
[cs(Cookie)] => -
[x-custom] => -
)
看起来字段是制表符分隔的。如果是这样,你不需要正则表达式,但可以这样做:
$fieldnames = array('date', 'time', 'cs-ip', 'cs-method', 'cs-uri', 'sc-status', 'sc-bytes', 'time-taken', 'cs(Referer)', 'cs(User-Agent)', 'cs(Cookie)', 'x-custom');
$parsed = array();
foreach($lines as $line) {
$fields = explode("t", $line);
foreach($fields as $index => $field) {
$tmp = array();
$tmp[$fieldnames[$index]] = $field;
}
$parsed[] = $tmp;
}
现在您将有一个很好的数组,字段名作为键。