将条件方程式从中缀转换为前缀表示法



在我们的应用程序中,我们允许用户编写特定的条件,并允许他们使用以下符号表示条件:

(1 and 2 and 3 or 4)

其中,每个数字对应于一个特定的规则/条件。现在的问题是,我应该如何转换它,这样最终的结果是这样的:

{
"$or": [
"$and": [1, 2, 3],
4
]
}

再举一个例子:

(1 or 2 or 3 and 4)

收件人:

{
"$or": [
1,
2,
"$and": [3, 4]
]
}



我已经写了50多行的令牌化器,它们成功地将语句令牌化为令牌,并使用堆栈/peek算法进行验证,令牌看起来如下:

["(", "1", "and", "2", "and", "3", "or", "4", ")"]

现在,我应该如何将这种"中缀表示法"转换为"前缀表示法",并遵循and优先于or的规则?

非常感谢某些指针或关键字!我现在所拥有的并没有真正引导我达到目前所需要的。

迄今为止的一些研究:

  • 数学解析器的智能设计
  • 将缺少的左括号添加到等式中
  • 具有优先级的公式(表达式)解析器
  • 中缀到后缀表示法
  • Dijkstra的调车场算法
  • 中缀和后缀算法

编辑

此外,如果用户坚持,用户可以指定任意数量的括号,例如:

((1 or 3) and (2 or 4) or 5)

所以它被翻译成:

{
"$or": [{
$and": [
"$or": [1, 3],
"$or": [2, 4]
},
5
]
}

编辑2

我算出了算法。作为答案发布在下面。谢谢你的帮助!

感谢导游们,至少我提出了自己的解决方案。由于这是我第一次做数学方程解析,如果我做得错误或效率低下,请原谅我,或者帮助我发现错误:

基本上,以下是我实现这一目标的步骤:

  1. 解析之前,请始终验证模式。出错时抛出错误
  2. 验证后,我们进行中缀表示法到前缀表示法的转换。此步骤要求"one_answers"优先于"或"。
    1. 反转给定的模式
    2. 进行中缀到后缀的表示法转换。我笨,我从中学习
    3. 再次执行相反操作
    4. 中缀到前缀应该在这个阶段完成
  3. 根据前缀符号构建一个树,以便
    1. 一个节点总是有两个分支,并且最大两个分支
    2. 向下遍历,直到它长满叶子
  4. 优化树,使其将类似的运算符合并在一起(例如,可以合并具有子$and的多个$and运算符,形成一个较短的树)
  5. 与给定的标准集混合,一切都完成了

可以在此处找到工作示例:http://jsfiddle.net/chaoszcat/uGKYj/3/

工作代码如下:

(function() {
/**
* This is a source example of my original question on
* http://stackoverflow.com/questions/20986255/converting-conditional-equation-from-infix-to-prefix-notation
* 
* This is my solution and use it at your own risk
* @author Lionel Chan <chaoszcat[at]gmail.com>
*/
/**
* isNumeric, from jQuery. Duplicated here to make this js code pure
* @param {mix} n Test subject
* @returns {boolean} true if it's numeric
*/
function isNumeric(n) {
return !isNaN(parseFloat(n))&&isFinite(n);
}
/**
* Node class - represent a operator or numeric node
* @param {string} token The token string, operator "and", "or", or numeric value
*/
function Node(token) {
this.parent = null;
this.children = []; //one node has two children at most
this.token = token;
this.is_operator = token === 'and' || token === 'or';
this.is_numeric = !this.is_operator;
this.destroyed = false;
}
Node.prototype = {
isOperator: function() { return this.is_operator;},
isNumeric: function() { return this.is_numeric;},
//While building tree, a node is full if there are two children
isFull: function() {
return this.children.length >= 2;
},
addChild: function(node) {
node.parent = this;
this.children.push(node);
},
hasParent: function() {
return this.parent !== null;
},
indexOfChild: function(node) {
for (var i = 0 ; i < this.children.length ; ++i) {
if (this.children[i] === node) {
return i;
}
}
return -1;
},
removeChild: function(node) {
var idx = this.indexOfChild(node);
if (idx >= 0) {
this.children[idx].parent = null; //remove parent relationship
this.children.splice(idx, 1); //splice it out
}
},
/**
* Pass my children to the target node, and destroy myself
* 
* @param {Node} node A target node
*/
passChildrenTo: function(node) {
for (var i = 0 ; i < this.children.length ; ++i) {
node.addChild(this.children[i]);
}
this.destroy();
},
//Destroy this node
destroy: function() {
this.parent.removeChild(this);
this.children = null;
this.destroyed = true;
}
};
/**
* Tree class - node manipulation
* @param {array} prefixTokens The converted, prefix-notated tokens
*/
function Tree(prefixTokens) {
this.buildTree(prefixTokens);
//Optimize tree - so that the tree will merge multiple similar operators together
this.optimize(this.root);
}
Tree.prototype = {
root: null,
//Reference to the deepest operator node in the tree for next attachment point
deepestNode: null,
/**
* Render this tree with given criteria array
* @param {array} crits
* @returns {object} The built criteria
*/
render: function(crits) {
//After optimization, we build the criteria and that's all!
return this.buildCriteria(this.root, crits);
},
/**
* Build criteria from root node. Recursive
* 
* @param {Node} node
* @param {array} crits
* @returns {object} of criteria
*/
buildCriteria: function(node, crits) {
var output = {},
label = '$'+node.token;
output[label] = []; //cpnditions array
for (var i = 0 ; i < node.children.length ; ++i) {
if (node.children[i].isOperator()) {
output[label].push(this.buildCriteria(node.children[i], crits));
}else{
output[label].push(crits[node.children[i].token-1]);
}
}
return output;
},
/**
* Optimize the tree, we can simplify nodes with same operator. Recursive
* 
* @param {Node} node
* @void
*/
optimize: function(node) {
//note that node.children.length will keep changing since the swapping children will occur midway. Rescan is required
for (var i = 0 ; i < node.children.length ; ++i) {
if (node.children[i].isOperator()) {
this.optimize(node.children[i]);
if (node.children[i].token === node.token) {
node.children[i].passChildrenTo(node);
i = 0; //rescan this level whenever a swap occured
}
}
}
},
/**
* Build tree from raw tokens
* @param {array} tokens
*/
buildTree: function(tokens) {
for (var i = 0 ; i < tokens.length ; ++i) {
this.addNode(new Node(tokens[i]));
}
},
/**
* Add node into tree
* 
* @param {Node} node
*/
addNode: function(node) {
//If no root? The first node is root
if (this.root === null) {
this.root = node;
this.deepestNode = node;
return;
}
//if deepestNode is full, traverse up until we find a node with capacity
while(this.deepestNode && this.deepestNode.isFull()) {
this.deepestNode = this.deepestNode.parent;
}
if (this.deepestNode) {
this.deepestNode.addChild(node);
}
//If the current node is an operator, we move the deepestNode cursor to it
if (node.isOperator()) {
this.deepestNode = node;
}
}
};
/**
* Main criteria parser
*/
var CriteriaParser = {
/**
* Convert raw string of pattern (1 and 2 or 3) into the object of criteria pattern
* 
* @param {string} str The raw pattern
* @param {array} crits The raw list of criteria
* @returns {String|Boolean}
*/
parse: function(str, crits) {
var tokens = this.tokenize(str),
validationResult = this.validate(tokens, crits),
prefixNotation = '';
//Once succeded, we proceed to convert it to prefix notation
if (validationResult === true) {
prefixNotation = this.infixToPrefix(tokens);
return (new Tree(prefixNotation)).render(crits);
}else{
return validationResult;
}
},
/**
* Convert the infix notation of the pattern (1 and 2 or 3) into prefix notation "or and 1 2 3"
* 
* Note:
* - and has higher precedence than or
* 
* Steps:
* 1. Reverse the tokens array
* 2. Do infix -> postfix conversion (http://www.cs.arizona.edu/classes/cs227/spring12/infix.pdf, http://scriptasylum.com/tutorials/infix_postfix/algorithms/infix-postfix/index.htm)
* 3. Reverse the result
* 
* @param {array} tokens The tokenized tokens
* @returns {array} prefix notation of pattern
*/
infixToPrefix: function(tokens) {
var reversedTokens = tokens.slice(0).reverse(), //slice to clone, so not to disturb the original array
stack = [],
output = [];
//And since it's reversed, please regard "(" as closing bracket, and ")" as opening bracket
do {
var stackTop = stack.length > 0 ? stack[stack.length-1] : null,
token = reversedTokens.shift();
if (token === 'and') {
while(stackTop === 'and') {
output.push(stack.pop());
stackTop = stack.length > 0 ? stack[stack.length-1] : null;
}
stack.push(token);
stackTop = token;
}else if (token === 'or') {
while(stackTop === 'and' || stackTop === 'or') { //and has higher precedence, so it will be popped out
output.push(stack.pop());
stackTop = stack.length > 0 ? stack[stack.length-1] : null;
}
stack.push(token);
stackTop = token;
}else if (token === '(') { //'(' is closing bracket in reversed tokens
while(stackTop !== ')' && stackTop !== undefined) { //keep looping until found a "open - )" bracket
output.push(stack.pop());
stackTop = stack.length > 0 ? stack[stack.length-1] : null;
}
stack.pop(); //remove the open ")" bracket
stackTop = stack.length > 0 ? stack[stack.length-1] : null;
}else if (token === ')') { //')' is opening bracket in reversed tokens
stack.push(token);
}else if (isNumeric(token)) {
output.push(token);
}else if (token === undefined) {
// no more tokens. Just shift everything out from stack
while(stack.length) {
stackTop = stack.pop();
if (stackTop !== undefined && stackTop !== ')') {
output.push(stackTop);
}
}
}
}while(stack.length || reversedTokens.length);
//Reverse output and we are done
return output.reverse();
},
/**
* Tokenized the provided pattern
* @param {string} str The raw pattern from user
* @returns {array} A tokenized array
*/
tokenize: function(str) {
var pattern = str.replace(/s/g, ''), //remove all the spaces :) not needed
tokens = pattern.split(''),
tokenized = [];
//Tokenize it and verify
var token = null,
next = null;
//attempts to concatenate the "and" and "or" and numerics
while (tokens.length > 0) {
token = tokens.shift();
next = tokens.length > 0 ? tokens[0] : null;
if (token === '(' || token === ')') {
tokenized.push(token);
}else if (token === 'a' && tokens.length >= 2 && tokens[0] === 'n' && tokens[1] === 'd') { //and
tokenized.push(token + tokens.shift() + tokens.shift());
}else if (token === 'o' && tokens.length >= 1 && next === 'r') { //or
tokenized.push(token + tokens.shift());
}else if (isNumeric(token)) {
while(isNumeric(next)) {
token += next;
tokens.shift(); //exhaust it
next = tokens.length > 0 ? tokens[0] : null;
}
tokenized.push(token);
}else{
tokenized.push(token);
}
}
return tokenized;
},
/**
* Attempt to validate tokenized tokens
* 
* @param {array} tokens The tokenized tokens
* @param {array} crits The user provided criteria set
* @returns {Boolean|String} Returns boolean true if succeeded, string if error occured
*/
validate: function(tokens, crits) {
var valid = true,
token = null,
stack = [],
nextToken = null,
criteria_count = crits.length;
for (var i = 0 ; i < tokens.length ; ++i) {
token = tokens[i];
nextToken = i < tokens.length - 1 ? tokens[i+1] : null;
if (token === '(') {
stack.push('(');
if (!isNumeric(nextToken) && nextToken !== '(' && nextToken !== ')') {
throw 'Unexpected token "'+nextToken+'"';
}
}else if (token === ')') {
if (stack.length > 0) {
stack.pop();
}else{
throw 'Unexpected closing bracket';
}
if (nextToken !== ')' && nextToken !== 'and' && nextToken !== 'or' && nextToken !== null) {
throw 'Unexpected token "'+nextToken+'"';
}
}else if (token === 'and' || token === 'or') {
if (!isNumeric(nextToken) && nextToken !== '(') {
throw 'Unexpected token "'+nextToken+'"';
}
}else if (isNumeric(token) && token <= criteria_count) {
if (nextToken !== ')' && nextToken !== 'and' && nextToken !== 'or') {
throw 'Unexpected token "'+nextToken+'"';
}
}else{
//anything not recognized, die.
throw 'Unexpected token "'+token+'"';
}
}
//Last step - check if we have all brackets closed
if (valid && stack.length > 0) {
throw 'Missing '+stack.length+' closing bracket';
}
return valid;
}
};
//This is an example pattern and criteria set. Note that pattern numbers must match criteria numbers.
var pattern = '((1 or 3) and (2 or 4) or 5)',
crits = [
1, 2, 3, 4, 5
];
//lazy on the document on load. Just delay
setTimeout(function() {
var result;
try {
result = JSON.stringify(CriteriaParser.parse(pattern, crits), undefined, 4);
}catch(e) {
result = e;
}
var pre = document.createElement('pre');
pre.innerHTML = result;
document.body.appendChild(pre); 
}, 10);
})();

使用两步流程最容易做到这一点。1) 转换为语法树。2) 将语法树转换为前缀表示法。

语法树基本上与前缀表示法相同,只是使用编程语言的数据结构构建的。

创建语法树的标准方法是使用LALR语法分析器生成器,这适用于大多数语言。LALR解析器快速、强大且富有表现力。LALR解析器生成器接受一个.y文件作为输入,并用您选择的编程语言为解析器输出一个源代码文件。因此,您运行一次LALR解析器生成器来生成您的解析器。

(所有程序员都应该使用"学会使用语法分析器生成器":)。使用标准的标记器也是明智的,而我猜你已经写了自己的:)

下面是一个.y文件,用于为您的迷你语言生成LALR解析器。通过LALR解析器生成器运行这个.y文件将输出LALR解析器的源,该解析器将令牌作为输入并输出解析树(在变量$root_tree中)。您需要在其他地方手动定义parsetree_binaryop数据结构。

%left AND.
%left OR.
start ::= expr(e). { $root_tree = e; }
expr(r) ::= expr(e1) AND expr(e2). { r = new parsetree_binaryop(e1, OP_AND, e2); }
expr(r) ::= expr(e1) OR expr(e2). { r = new parsetree_binaryop(e1, OP_OR, e2); }
expr(r) ::= LPAR expr(e) RPAR. { r = e; }

"%left AND"意味着AND是左关联的(我们也可以选择right,这对AND和OR来说无关紧要)。在"%left OR"之前提到"%left AND"意味着AND比OR绑定得更紧密,因此生成的解析器将做正确的事情。

当您拥有解析器提供的语法树时,生成文本表示就很容易了。

编辑:这似乎是一个LALR解析器生成器,它以JavaScript输出一个解析器:http://sourceforge.net/projects/jscc/

首先定义语义。在你的第一个例子中,你给出了(1 and 2 and 3) or 4的解释,但它也可以是1 and 2 and (3 or 4),所以:

{
"$and": [
{"$or": [3,4] },
[1,2]
]
}

让我们假设and具有更高的优先级。然后只需浏览列表即可使用and加入所有条款。接下来,使用or加入所有其他内容。

最新更新