我正在尝试搜索出现在波浪号(~(符号边框内的单词。

 e.g. ~albert~ is a ~good~ boy.

我知道这可以通过使用~.+?~，并且它已经对我有用。但是在特殊情况下，我需要匹配嵌套的波浪号句子。

 e.g. ~The ~spectacle~~ was ~broken~

在上面的示例中，我必须分别捕获"奇观"、"奇观"和"破碎"。这些将逐字翻译或随附文章(An，The，随便(。原因是在我的系统中：

1) 'The spectacle' requires a separate translation on a specific cases.
2) 'Spectacle' also needs translation on specific cases.
3) IF a tranlsation exist for The spectacle, we will use that, ELSE 
   we will use

另一个解释这一点的例子是：

 ~The ~spectacle~~ was ~borken~, but that was not the same ~spectacle~ 
  that was given to ~me~.

在上面的例子中，我将翻译：

 1) 'The spectacle' (because the translation case exists for 'The spectacle', otherwise I would've only translated spectacle on it's own)
 2) 'broken'
 3) 'spectacle'
 4) me

我在组合一个表达式时遇到问题，该表达式将确保在我的正则表达式中捕获它。到目前为止，我设法使用的一个是"~.+？~"。但我知道，通过某种形式的展望或向后看，我可以做到这一点。谁能帮我解决这个问题？

其中最重要的方面是回归证明，这将确保现有的东西不会损坏。如果我设法做对了，我会发布它。

注

：注：如果有帮助，目前我将有只有一个级别的嵌套需要分解的实例。所以~~奇观~~将是最深层次的(直到我需要更多!!!!(

我前段时间写了这样的东西，但我还没有测试太多：

(~(?(?=.*?~~.*?~).*?~.*?~.*?~|[^~]+?~))

或

(~(?(?=.*?~[A-Za-z]*?~.*?~).*?~.*?~.*?~|[^~]+?~))

正则表达式101

另一种选择

(~(?:.*?~.*?~){0,2}.*?~)
                 ^^ change to max depth

哪个效果最好

要添加更多，请在您看到一堆的两个地方添加几组额外的.*?~。

主要问题

如果我们允许无限嵌套，我们怎么知道它会在哪里结束和开始？一个笨拙的图表：

~This text could be nested ~ so could this~ and this~ this ~Also this~
|                          |              |_________|      |         |
|                          |_______________________________|         |
|____________________________________________________________________|

或：

~This text could be nested ~ so could this~ and this~ this ~Also this~
|                          |              |         |      |_________|
|                          |______________|         |
|___________________________________________________|

编译器不知道该选择哪个

为了你的句子

~The ~spectacle~~ was ~broken~, but that was not the same ~spectacle~ that was given to ~me~.
|    |         ||_____|      |                            |         |
|    |         |_____________|                            |         |
|    |____________________________________________________|         |
|___________________________________________________________________|

或：

~The ~spectacle~~ was ~broken~, but that was not the same ~spectacle~ that was given to ~me~.
|    |_________||     |______|                            |_________|                   |__|
|_______________|

我该怎么办？

使用交替字符(如@tbraun建议的那样(，以便编译器知道从哪里开始和结束：

{This text can be {properly {nested}} without problems} because {the compiler {can {see {the}}} start and end points} easily. Or use a compiler:

注意：我不怎么做Java，所以有些代码可能不正确

import java.util.List;
String[] chars = myString.split('');
int depth = 0;
int lastMath = 0;
List<String> results = new ArrayList<String>();
for (int i = 0; i < chars.length; i += 1) {
    if (chars[i] === '{') {
        depth += 1;
        if (depth === 1) {
            lastIndex = i;
        }
    }
    if (chars[i] === '}') {
        depth -= 1;
        if (depth === 0) {
            results.add(StringUtils.join(Arrays.copyOfRange(chars, lastIndex, i + 1), ''));
        }
        if (depth < 0) {
            // Balancing problem Handle an error
        }
    }
}

这使用 StringUtils

您需要一些东西来区分开始/结束模式。即 {}

您可以使用模式{[^{]*?}来排除{：

{The {spectacle}} was {broken}

第一次迭代

{spectacle}
{broken}

第二次迭代

{The spectacle}

捕获由特殊字符嵌套/括起来的正则表达式组

正则表达式101

主要问题

为了你的句子

我该怎么办？

相关内容

最新更新

热门标签：