如何快速解析Apache日志文件



假设我有一个日志文件,我已经将其拆分为一个字符串数组。例如,我这里有这些行。

123.4.5.1-[03/Sep/2013:18:38:48-0600]";GET/products/car/HTTP/1.1";200 3327〃-"quot;Mozilla/5.0(Macintosh;Intel Mac OS X 10_8_4(AppleWebKit/537.36(KHTML,类似Gecko(Chrome/29.01.547.65 Safari/537.36〃;

123.4.5.6-[2013年9月3日:18:38:58-0600]";GET/jobs/HTTP/1.1";500 821〃-"quot;Mozilla/5.0(Macintosh;Intel Mac OS X 10.8;rv:23.0(Gecko/20100101 Firefox/23.0";

我可以用典型的字符串操作来解析这些,但我认为用Regex有更好的方法。我试图遵循某人在python中使用的类似模式,但我不太明白。这是我的尝试。

这是模式:([(\d.(]+(-[(.?(]"(./em>?(";(\d+(-";(.?("(./em>?(";当我尝试使用它时,我得不到匹配项。

let lines = contents.split(separator: "n")
let pattern = "([(\d\.)]+) - - \[(.*?)\] "(.*?)" (\d+) - "(.*?)" "(.*?)""
let regex = try! NSRegularExpression(pattern: pattern, options: [])
for line in lines {
let range = NSRange(location: 0, length: line.utf16.count)
let parsedData = regex.firstMatch(in: String(line), options: [], range: range)
print(parsedData)
}

如果我能把数据提取到一个模型中,那将是最好的。我需要确保代码的性能和速度,因为我可能需要考虑数千行。

预期结果

let someResult = (String, String, String, String, String, String) or 
let someObject: LogFile = LogFile(String, String, String...)

我想把解析后的行分解成单独的部分。IPOSOS VersionBrowserBrowser Version等。对数据的任何实际解析都将是足够的。

对于您显示的示例,您可以尝试以下操作吗。

^((?:d+.){3}d+).*?[([^]]*)].*?"([^"]*)"s*(d+)s*(d+)s*"-"s*"([^"]*)"$

以上regex 的在线演示

解释:添加以上详细解释。

^(                   ##Starting a capturing group checking from starting of value here.
(?:d+.){3}d+   ##In a non-capturing group matching 3 digits followed by . with 1 or more digits
)                    ##Closing 1st capturing group here.
.*?[                ##Matching non greedy till [ here.
([^]]*)              ##Creating 2nd capturing group till ] here.
].*?"               ##Matching ] and non greedy till " here.
([^"]*)              ##Creating 3rd capturing group which has values till " here.
"s*                 ##Matching " spaces one or more occurrences here.
(d+)                ##Creating 4th capturing group here which has all digits here.
s*                  ##Matching spaces one or more occurrences here.
(d+)                ##Creating 5th capturing group here which has all digits here.
s*"-"s*"           ##Spaces 1 or more occurrences "-" followed by spaces  1 or more occurrences " here.
([^"]*)              ##Creating 6th capturing group till " here.
"$                   ##Matching " at last.

正确的正则表达式模式是@RavinderSingh13提供的,但我也想添加我所做的,使其在代码中正常运行,以便其他人将来可以使用它,而不必搜索所有StackOverflow来寻找答案。

我需要找到一种方法,将Apache日志文件解析为swift中的可用对象。代码如下。

机具扩展

extension String {
func groups(for regexPattern: String) -> [[String]] {
do {
let text = self
let regex = try NSRegularExpression(pattern: regexPattern)
let matches = regex.matches(in: text,
range: NSRange(text.startIndex..., in: text))
return matches.map { match in
return (0..<match.numberOfRanges).map {
let rangeBounds = match.range(at: $0)
guard let range = Range(rangeBounds, in: text) else {
return ""
}
return String(text[range])
}
}
} catch let error {
print("invalid regex: (error.localizedDescription)")
return []
}
}
}

创建模型对象

class EventLog {
let ipAddress: String
let date: String
let getMethod: String
let statusCode: String
let secondStatusCode: String
let versionInfo: String

init(ipAddress: String, date: String, getMethod: String, statusCode: String, secondStatusCode: String, versionInfo: String ){
self.ipAddress = ipAddress
self.date = date
self.getMethod = getMethod
self.statusCode = statusCode
self.secondStatusCode = secondStatusCode
self.versionInfo = versionInfo
}
}

分析数据

我想指出的是,regex模式返回一个[[String]],所以您必须从返回的总体组中获取子组。类似于解析JSON。

func parseData() {
let documentsUrl:URL =  FileManager.default.urls(for: .documentDirectory, in: .userDomainMask).first!
let destinationFileUrl = documentsUrl.appendingPathComponent("logfile.log")

do {
let contents = try String(contentsOf: destinationFileUrl, encoding: .utf8)
let lines = contents.split(separator: "n")
let pattern = "^((?:\d+\.){3,}\d).*?\[([^]]*)\].*?"([^"]*)"\s*(\d+)\s+(\d+)\s*"-"\s*"([^"]*)"$"
for line in lines {
let group = String(line).groups(for: pattern)
let subGroup = group[0]
let ipAddress = subGroup[1]
let date = subGroup[2]
let getMethod = subGroup[3]
let statusCode = subGroup[4]
let secondStatusCode = subGroup[5]
let versionInfo = subGroup[6]

DispatchQueue.main.async {
self.eventLogs.append(EventLog(ipAddress: ipAddress, date: date, getMethod: getMethod, statusCode: statusCode, secondStatusCode: secondStatusCode, versionInfo: versionInfo))
}
}
} catch {
print(error.localizedDescription)
}
}

模式没有匹配项,因为在连字符的位置有1个以上的数字。

为了使模式更具性能,您可以使用否定字符类"([^"]*)"来捕获例如"之间除"之外的任何字符

(d+(?:.d+)+) - - [([^][]+)] "([^"]*)" (d+) (d+) "[^"]+" "([^"]+)"
  • (d+(?:.d+)+)捕获组1,匹配1个以上的数字,并重复1个以上.和1个以上数字
  • - -字面匹配
  • [([^][]+)]匹配[捕获组2中除[]之外的任何字符的1+倍,并匹配]
  • "([^"]*)"匹配"捕获组3中除"以外的任何字符的1+倍,并匹配"
  • (d+) (d+)捕获组4和5匹配的1+数字
  • "[^"]+""的先前机制相同,但仅匹配
  • "([^"]+)"与第6组"的先前机制相同

Regex演示| Swift演示

示例代码

let pattern = #"(d+(?:.d+)+) - - [([^][]+)] "([^"]*)" (d+) (d+) "[^"]+" "([^"]+)""#
let regex = try! NSRegularExpression(pattern: pattern, options: .anchorsMatchLines)
let testString = #"123.4.5.1 - - [03/Sep/2013:18:38:48 -0600] "GET /products/car/ HTTP/1.1" 200 3327 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.65 Safari/537.36""#
let stringRange = NSRange(location: 0, length: testString.utf16.count)
let matches = regex.matches(in: testString, range: stringRange)
var result: [[String]] = []
for match in matches {
var groups: [String] = []
for rangeIndex in 1 ..< match.numberOfRanges {
groups.append((testString as NSString).substring(with: match.range(at: rangeIndex)))
}
if !groups.isEmpty {
result.append(groups)
}
}
print(result)

输出

[["123.4.5.1", "03/Sep/2013:18:38:48 -0600", "GET /products/car/ HTTP/1.1", "200", "3327", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.65 Safari/537.36"]]

最新更新