获取每个标记标题之间的所有文本

我正在尝试提取markdown文件中标题之间的文本。降价文件看起来像这样：

### Description
This is a description
### Changelog
This is my changelog
### Automated Tests added
- Test 1
- Test 2
### Acceptance Tests performed

### Blurb
Concise summary of what this PR is.

有没有无论如何我可以返回所有的组，这样：

组1＝"；这是一个描述">
组2＝"；这是我的变更日志">

。。。等等

您不能使用regex来实现这一点，因为regex没有(合理的(方法来知道当；标题类似"；元素包含在代码块中，例如

# This is a heading
```
# This is a not a heading
```

"这不是一个标题"；是代码块中类似于标题的元素。

为了提取标题，您需要使用markdown解析器，然后使用生成的AST来提取标题。

这可能很简单：

import { remark } from 'remark';
import { visit } from 'unist-util-visit';
export type Heading = {
depth: number;
title: string;
};
export const extractMarkdownHeadings = async (
content: string,
): Promise<Heading[]> => {
const headings: Heading[] = [];
await remark()
.use(() => {
return (root) => {
visit(root, 'heading', (node) => {
headings.push({
depth: node.depth,
title: 'value' in node.children[0] ? node.children[0].value : 'N/A',
});
});
};
})
.process(content);
return headings;
};

您可以使用^[^#]+。这将排除以#开头的行。如果您想要分组，可以使用^([^#]+)。

请注意，匹配包括换行符。如果您不需要它们，也可以使用^([^#n]+)排除它们。

查看结果

获取除最后一个匹配项之外的所有匹配项。

regex match pattern:(###[hwtn.]*)((###)?n)

check sample here
https://regex101.com/r/eBtSTM/2

相关内容

最新更新

热门标签：