从字符串中的列表中查找字符串的效率不够

def errorList = readFile WORKSPACE + "/list.txt"
def knownErrorListbyLine = errorList.readLines()
def build_log = new URL (Build_Log_URL).getText()
def found_errors = null
for(knownError in knownErrorListbyLine) {
if (build_log.contains(knownError)) {
found_errors = build_log.readLines().findAll{ it.contains(knownError) }
for(error in found_errors) {
println "FOUND ERROR: " + error
}
}
}

我写这段代码是为了在字符串中查找列出的错误，但大约需要20秒。

如何提高性能？我很想从中学习。

非常感谢！

list.txt每行包含一个字符串：

Step ... was FAILED
[ERROR] Pod-domainrouter call failed
@type":"ErrorExtender
[postDeploymentSteps] ... does not exist.
etc...

构建日志是我需要查找这些错误的地方。

试试这个：

def errorList = readFile WORKSPACE + "/list.txt"
def knownErrorListbyLine = errorList.readLines()
def build_log = new URL (Build_Log_URL)
def found_errors = null
for(knownError in knownErrorListbyLine) {
build_log.eachLine{ 
if ( it.contains(knownError) ) {
println "FOUND ERROR: " + error 
}
}
}

这可能更具性能：

def errorList = readFile WORKSPACE + "/list.txt"
def knownErrorListbyLine = errorList.readLines()
def build_log = new URL (Build_Log_URL)
def found_errors = null
build_log.eachLine{
for(knownError in knownErrorListbyLine) {
if ( it.contains(knownError) ) {
println "FOUND ERROR: " + error 
}
} 
}

尝试使用最后一个依赖字符串eachLine的字符串。

def errorList = readFile WORKSPACE + "/list.txt"
def knownErrorListbyLine = errorList.readLines()
def build_log = new URL (Build_Log_URL).getText()
def found_errors = null
build_log.eachLine{
for(knownError in knownErrorListbyLine) {
if ( it.contains(knownError) ) {
println "FOUND ERROR: " + error 
}
} 
}

尝试将build_log.readLines()移动到循环之外的变量。

def errorList = readFile WORKSPACE + "/list.txt"
def knownErrorListbyLine = errorList.readLines()
def build_log = new URL (Build_Log_URL).getText()
def found_errors = null
def buildLogByLine = build_log.readLines()
for(knownError in knownErrorListbyLine) {
if (build_log.contains(knownError)) {
found_errors = buildLogByLine.findAll{ it.contains(knownError) }
for(error in found_errors) {
println "FOUND ERROR: " + error
}
}
}

更新：使用多个线程

注意：如果errorList大小足够大，这可能会有所帮助。如果匹配误差分布均匀。

def sublists = knownErrorListbyLine.collate(x) 
// int x - the sublist size, 
// depends on the knownErrorListbyLine size, set the value to get e. g. 4 sublists (threads). 
// Also do not use more than 2 threads per CPU. Start from 1 thread per CPU.
def logsWithErrors = []// list for store results per thread
def lock = new Object()
def threads = sublists.collect { errorSublist ->
Thread.start {
def logs = build_log.readLines()
errorSublist.findAll { build_log.contains(it) }.each { error ->
def results = logs.findAll { it.contains(error) }
synchronized(lock) {
logsWithErrors << results
}
}
}
}
threads*.join() // wait for all threads to finish
logsWithErrors.flatten().each {
println "FOUND ERROR: $it"
}

此外，正如其他用户早些时候建议的那样，尝试测量日志下载时间，这可能是瓶颈：

def errorList = readFile WORKSPACE + "/list.txt"
def knownErrorListbyLine = errorList.readLines()
def start = Calendar.getInstance().timeInMillis
def build_log = new URL(Build_Log_URL).getText()
def end = Calendar.getInstance().timeInMillis
println "Logs download time: ${(end-start)/1000} ms"
def found_errors = null

相关内容

最新更新

热门标签：