确定使用Saxon-JS进行XSLT转换的性能瓶颈



谁能提供一些指导来确定转换中的瓶颈?

这是Saxon-JS的node.js实现。我正在尝试提高转换一些XML文档的速度,这样我就可以提供一个同步API,理想情况下响应时间不到60秒(230秒是Application Gateway的硬限制)。我需要能够处理高达50MB大小的XML文件以及。

我已经运行了node的内置分析器(https://nodejs.org/en/docs/guides/simple-profiling/)。但是考虑到Saxon-JS免费版本的源代码不是真正的人类可读,很难理解结果。

我的代码
const path = require('path');
const SaxonJS = require('saxon-js');
const { loadCodelistsInMem } = require('../standards_cache/codelists');
const { writeFile } = require('../config/fileSystem');
const config = require('../config/config');
const { getStartTime, getElapsedTime } = require('../config/appInsights');
// Used for easy debugging the xslt stylesheet
// Runs iati.xslt transform on the supplied XML
const runTransform = async (sourceFile) => {
try {
const fileName = path.basename(sourceFile);
const codelists = await loadCodelistsInMem();
// this pulls the right array of SaxonJS resources from the resources object
const collectionFinder = (url) => {
if (url.includes('codelist')) {
// get the right filepath (remove file:// and after the ?
const versionPath = url.split('schemata/')[1].split('?')[0];
if (codelists[versionPath]) return codelists[versionPath];
}
return [];
};
const start = getStartTime();
const result = await SaxonJS.transform(
{
sourceFileName: sourceFile,
stylesheetFileName: `${config.TMP_BASE_DIR}/data-quality/rules/iati.sef.json`,
destination: 'serialized',
collectionFinder,
logLevel: 10,
},
'async'
);
console.log(`${getElapsedTime(start)} (s)`);
await writeFile(`performance_tests/output/${fileName}`, result.principalResult);
} catch (e) {
console.log(e);
}
};
runTransform('performance_tests/test_files/test8meg.xml');

控制台输出示例:

❯ node --prof utils/runTransform.js
SEF generated by Saxon-JS 2.0 at 2021-01-27T17:10:38.029Z with -target:JS -relocate:true
79.938 (s)
❯ node --prof-process isolate-0x102d7b000-19859-v8.log > v8_log.txt

文件:

样式表
  • 示例XML: is test8m . XML
  • 节点分析日志v8_log.txt

最大性能问题的V8日志片段:

[Bottom up (heavy) profile]:
Note: percentage shows a share of a particular caller in the total
amount of its parent calls.
Callers occupying less than 1.0% are not shown.
ticks parent  name
33729   52.5%  T __ZN2v88internal20Builtin_ConsoleClearEiPmPNS0_7IsolateE
6901   20.5%    T __ZN2v88internal20Builtin_ConsoleClearEiPmPNS0_7IsolateE
3500   50.7%      T __ZN2v88internal20Builtin_ConsoleClearEiPmPNS0_7IsolateE
3197   91.3%        LazyCompile: *k /Users/nosvalds/Projects/validator-api/node_modules/saxon-js/SaxonJS2N.js:287:264
3182   99.5%          LazyCompile: *<anonymous> /Users/nosvalds/Projects/validator-api/node_modules/saxon-js/SaxonJS2N.js:682:218
2880   90.5%            LazyCompile: *d /Users/nosvalds/Projects/validator-api/node_modules/saxon-js/SaxonJS2N.js:734:184

非常感谢。这方面的资源已经不多了。我也试过了:

  • 使用stylesheetInternal参数与预解析JSON(没有太大的区别)
  • 将文档拆分为单独的文档,在根<iati-activities>根元素中只包含一个活动<iati-activity>子元素,分别转换每个子元素,并将其放回一起,这最终花费了2倍的时间。

,

尼克

您在https://saxonica.plan.io/boards/5/topics/8105?r=8106上问了同样的问题,我已经在那里回答了。我知道StackOverflow不喜欢只有链接的答案,但我更喜欢通过我们自己的支持渠道来支持用户,而不是通过StackOverflow。

相关内容

  • 没有找到相关文章

最新更新