我需要从日志文件中识别服务器事件。我为此目的使用模式匹配。我的正则表达式不起作用.请检查我的正则表达式是错误的还是问题出乎其他原因。
示例输入为 :--
2009/12/14 11:49:20.55 00 STARTUP Distributed Access Infrastructure V1.1.0
2009/12/14 11:49:20.55 01 STARTUP Tools Access Server initialization started
2009/12/14 11:49:20.55 TAS#####EC05003E 00 STARTUP Environment:
2009/12/14 11:49:20.55 TAS#####EC05003E 01 STARTUP Job.....DAITAS System...EC05 ASID.....003E
2009/12/14 11:49:20.55 TAS#####EC05003E 02 STARTUP User....USRT001 Group....SYS1 JobNum...STC00079
2009/12/14 11:49:20.55 TAS#####EC05003E 03 STARTUP Local...GMT-08 GMT......2009/12/14 19:49
我的脚本是:
public void map(Object key, Text value, Context context) throws IOException , InterruptedException{
String input=value.toString();
String delimiter= "[n]";
String[] tokens=input.split(delimiter);
String sample = null;
Pattern pattern;
String regex= " \s+\d+\s+[a-z,A-Z]+\s ";
pattern=Pattern.compile(regex);
for(int i=0;i<tokens.length;i++){
sample=tokens[i];
System.out.println(sample.toString());
System.out.println("enter here");
Matcher match=pattern.matcher(sample);
boolean val = match.matches();
System.out.println("the conditions" + val);
System.out.println("enter here 2");
if(val){
System.out.println("the regex is found" + val);
logEvent.set(sample);
System.out.println("the value of logEvent is "+ logEvent);
}
else{
logInformation.set(sample);
System.out.println("the log informaTION" + logInformation);
}
context.write(logEvent, logInformation);
我需要认识到 - 启动
谢谢
试试这个
try {
Regex regexObj = new Regex(@"(?im)s+(?<event>d+s+[a-z]+)s+(?<details>[^rn]+)$");
Match matchResults = regexObj.Match(subjectString);
while (matchResults.Success) {
for (int i = 1; i < matchResults.Groups.Count; i++) {
Group groupObj = matchResults.Groups[i];
if (groupObj.Success) {
// matched text: groupObj.Value
// match start: groupObj.Index
// match length: groupObj.Length
}
}
matchResults = matchResults.NextMatch();
}
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}
正则表达式解释
@"
(?im) # Match the remainder of the regex with the options: case insensitive (i); ^ and $ match at line breaks (m)
s # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
(?<event> # Match the regular expression below and capture its match into backreference with name “event”
d # Match a single digit 0..9
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
s # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
[a-z] # Match a single character in the range between “a” and “z”
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
s # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
(?<details> # Match the regular expression below and capture its match into backreference with name “details”
[^rn] # Match a single character NOT present in the list below
# A carriage return character
# A line feed character
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
$ # Assert position at the end of a line (at the end of the string or before a line break character)
"
希望这有帮助。