多核 CPU 上的 antlr4 性能

最近，我的程序遇到了性能问题。调查最终指出了 antlr4 深处的一个问题，我用它来解析 SQL。如代码所示，dfa.states 上有一个同步块。该块实际上限制了具有 8 个或更多内核的计算机上的解析性能。我想知道是否有人遇到过这个问题并找到了解决方案？

protected DFAState addDFAState(ATNConfigSet configs) {
    /* the lexer evaluates predicates on-the-fly; by this point configs
     * should not contain any configurations with unevaluated predicates.
     */
    assert !configs.hasSemanticContext;
    DFAState proposed = new DFAState(configs);
    ATNConfig firstConfigWithRuleStopState = null;
    for (ATNConfig c : configs) {
        if ( c.state instanceof RuleStopState ) {
            firstConfigWithRuleStopState = c;
            break;
        }
    }
    if ( firstConfigWithRuleStopState!=null ) {
        proposed.isAcceptState = true;
        proposed.lexerActionExecutor = ((LexerATNConfig)firstConfigWithRuleStopState).getLexerActionExecutor();
        proposed.prediction = atn.ruleToTokenType[firstConfigWithRuleStopState.state.ruleIndex];
    }
    DFA dfa = decisionToDFA[mode];
    synchronized (dfa.states) {
        DFAState existing = dfa.states.get(proposed);
        if ( existing!=null ) return existing;
        DFAState newState = proposed;
        newState.stateNumber = dfa.states.size();
        configs.setReadonly(true);
        newState.configs = configs;
        dfa.states.put(newState, newState);
        return newState;
    }
}

经过几天的挣扎，我找到了解决方案。就像Mike Lische所说的那样，同步块似乎试图减少内存占用。但它对具有繁重 SQL 分析工作负载的多核计算机上的性能有重大影响。我试图解析由mysqldump生成的100gb+ SQL文件。

我的解决方案是使用克隆的DFA而不是静态DFA创建自定义解释器。在我的 10 核 AMD 线程开膛手上，结果几乎好了 16 倍，CPU 使用率超过 95%。

setInterpreter(new LexerATNSimulator(this, _ATN, getDFA(), new PredictionContextCache()));
private DFA[] getDFA() {
    DFA[] result = new DFA[_ATN.getNumberOfDecisions()];
    for (int i = 0; i < _ATN.getNumberOfDecisions(); i++) {
        result[i] = new DFA(_ATN.getDecisionState(i), i);
    }
    return result;
}

出于内存效率原因，给定语言的所有解析器实例共享相同的 DFA（它是静态结构）。但是，这需要使此结构线程安全（解析器可以在后台线程中使用）。没办法。

相关内容

最新更新

热门标签：