拆分基于字符串的空行,并基于分隔符创建对象



我正在尝试创建一个非常初级的解析器,该解析器将接收多行字符串并将其转换为包含对象的数组。字符串的格式如下:

title: This is a title
description: Shorter text in one line
image: https://www.example.com
title: This is another title : with colon
description: Longer text that potentially
could span over several new lines,
even three or more
image: https://www.example.com

title: This is another title, where the blank lines above are two
description: Another description
image: https://www.example.com

目标是将其转换为一个数组,其中由一个或多个空行分隔的每个部分都将是一个包含键/值对的对象,冒号作为键和值之间的分隔符,一个新行作为单个键/值之间的隔离符。因此,上面的输入应该会产生以下输出:

[
{
title: "This is a title",
description: "Shorter text in one line",
image: "https://www.example.com"
},
{
title: "This is another title : with colon",
description: "Longer text that potentially could span over several new lines, even three or more",
image: "https://www.example.com"
},
{
title: "This is another title, where the blank lines above are two",
description: "Another description",
image: "https://www.example.com"
}
]

我从这个CodePen开始,但正如你所看到的,代码目前有一些问题需要在完成之前解决。

  1. 如果在值中使用冒号,则不应拆分它们。不知何故,我需要通过第一次出现的冒号进行拆分,然后忽略值中的其他冒号。目前的结果如下:
// Input:
//     title: This is another title : with colon
//     image: https://www.example.com
{
image: " https",
title: " This is another title "
}
  1. 有些行可能包含一个跨越多行的值。值中的换行符应连接成一行,而不应被视为新键/值对的分隔符。目前的结果如下:
// Input:
//     description: Longer text that potentially
//     could span over several new lines,
//     even three or more
{
could span over several new lines,: undefined,
description: " Longer text that potentially",
even three or more: undefined
}

鉴于我目前掌握的代码,我将非常感谢您对如何处理这一问题的帮助。任何关于如何优化代码以提高性能效率的建议也非常受欢迎。

作为部分答案,下面将处理一行中的多个分号:

var input = `title: This is a title
description: Shorter text in one line
image: https://www.example.com
title: This is another title : with colon
description: Longer text that potentially
could span over several new lines,
even three or more
image: https://www.example.com

title: This is another title, where the blank lines above are two
description: Another description
image: https://www.example.com`;
var finalArray = [];
var first = input.split(/ns*n/);
console.log("Array with sections split:", first);
first.forEach(function (section) {
var result = section.split("n").reduce(function (o, pair) {
pair = pair.split(":");
return (o[pair.shift()] = pair.join(':')), o;
}, {});
console.log(result);
finalArray.push(result);
});
console.log("Array of sections as objects:", finalArray);

这仍然不能处理多行值,但问题是,在您的模式中,无法确定新行何时意味着新属性的开始,何时只是值的延续。您已经排除了使用冒号和逗号分隔的可能性,所以您现在没有办法解决第二个问题。

我建议在正文中使用一个不允许的特殊字符来表示键值对的末尾,并在此基础上进行拆分。

如果您使用文本,有一个非常简单的规则,请始终记住正则表达式。

尝试这种方法:

const data = `title: This is a title
description: Shorter text in one line
image: https://www.example.com
title: This is another title : with colon
description: Longer text that potentially
could span over several new lines,
even three or more
image: https://www.example.com

title: This is another title, where the blank lines above are two
description: Another description
image: https://www.example.com`;
const bloks = data.split(/ns*n/);
result = bloks.map((blok) => {
const title = blok.match(/(?<=title:)([Ss]*n?)(?=description:)/gm).join(' ').trim();
const description = blok.match(/(?<=description:)([Ss]*n?)(?=image:)/gm).join(' ').replaceAll('n', ' ').trim();
const image = blok.match(/(?<=image:)([Ss]*n?)(?=)/gm).join(' ').trim();
return { title, description, image };
})
console.log(result);
.as-console-wrapper { max-height: 100% !important; top: 0; }

最新更新