如何使用Haskell解析中缀而不是前缀



我需要帮助我用Haskell编写这个程序。我已经写了大部分,下面是我基本上要做的:

  1. 当我写作

parse";a+b";

在终端中,我想要这个作为输出:

Plus(单词"a"((单词"b"(

  1. 当我写作时

parse";a-2*b+c";

在终端中,我想要这个作为输出:

减号(单词"a"((加号(Mult(Num 2((单词"b"(((单词"c"((

到目前为止我的代码:

data Ast
= Word String
| Num Int
| Mult Ast Ast
| Plus Ast Ast
| Minus Ast Ast
deriving (Eq, Show)
tokenize :: [Char] -> [String]
tokenize [] = []
tokenize (' ' : s) = tokenize s
tokenize ('+' : s) = "+" : tokenize s
tokenize ('*' : s) = "*" : tokenize s
tokenize (c : s)
| isDigit c =
let (cs, s') = collectWhile isDigit s
in (c : cs) : tokenize s'
| isAlpha c =
let (cs, s') = collectWhile isAlpha s
in (c : cs) : tokenize s'
| otherwise = error ("unexpected character " ++ show c)
collectWhile :: (Char -> Bool) -> String -> (String, String)
collectWhile p s = (takeWhile p s, dropWhile p s)
isDigit, isAlpha :: Char -> Bool
isDigit c = c `elem` ['0' .. '9']
isAlpha c = c `elem` ['a' .. 'z'] ++ ['A' .. 'Z']
parseU :: [String] -> (Ast, [String])
parseU ("+" : s0) =
let (e1, s1) = parseU s0
(e2, s2) = parseU s1
in (Plus e1 e2, s2)
parseU ("*" : s0) =
let (e1, s1) = parseU s0
(e2, s2) = parseU s1
in (Mult e1 e2, s2)
parseU (t : ts)
| isNumToken t = (Num (read t), ts)
| isWordToken t = (Word t, ts)
| otherwise = error ("unrecognized token " ++ show t)
parseU [] = error "unexpected end of input"
isNumToken, isWordToken :: String -> Bool
isNumToken xs = takeWhile isDigit xs == xs
isWordToken xs = takeWhile isAlpha xs == xs
parse :: String -> Ast
parse s =
case parseU (tokenize s) of
(e, []) -> e
(_, t : _) -> error ("unexpected token " ++ show t)
inn :: Ast -> String
inn (Plus x y) = innP x ++ " + " ++ innP y
inn (Mult x y) = innP x ++ " * " ++ innP y
inn ast = innP ast
innP :: Ast -> String
innP (Num n) = show n
innP (Plus x y) = "(" ++ innP x ++ " + " ++ innP y ++ ")"
innP (Mult x y) = "(" ++ innP x ++ " * " ++ innP y ++ ")"
innP (Word w) = w -- 
innfiks :: String -> String
innfiks s = inn (parse s)

现在我在终端中发布我写的文本时遇到了一个错误,但当我这样写时:

解析"+a b">

我得到了正确的输出:

Plus(单词"a"((单词"b"(

我知道我必须更改代码,以便它接受我发送给以下表单上的解析函数的内容:

值操作员值

而不是在这个表单上:

操作员值

但我很难找到我必须做什么以及在哪里做这一改变。

要处理具有优先级的中缀运算符,一种方法是引入与优先级相对应的解析函数序列。所以,如果你有";因子";其可以被相乘以创建";术语";其可以相加或相减以创建";表达式";,您将希望为这些级别中的每一个级别创建解析器函数。解析一个";因子";(即,一个单词或数字(很容易,因为你已经写了代码:

parseFactor :: [String] -> (Ast, [String])
parseFactor (t : ts)
| isNumToken t = (Num (read t), ts)
| isWordToken t = (Word t, ts)
| otherwise = error ("unrecognized token " ++ show t)
parseFactor [] = error "unexpected end of input"

解析一个术语更为棘手。你想从解析一个因子开始,然后,可选地,解析一个*,然后再解析另一个因子,然后将其视为一个项,再可选地乘以另一个因素,依此类推

parseTerm :: [String] -> (Ast, [String])
parseTerm ts
=  let (f1, ts1) = parseFactor ts     -- parse first factor
in  go f1 ts1
where go acc ("*":ts2)                -- add a factor to an accumulating term
= let (f2, ts3) = parseFactor ts2
in go (Mult acc f2) ts3
go acc rest = (acc, rest)       -- no more factors: return the term

如果你愿意,试着写一个类似的parseExpr来解析由+字符分隔的项(现在跳过减法(,并在类似的东西上测试它:

parseExpr (tokenize "2 + 3 * 6 + 4 * 8 * 12 + 1")

对于剧透,这里有一个同时处理+-的版本,不过请注意,您的标记器还不能正确处理减法,所以您必须首先解决这个问题。

parseExpr :: [String] -> (Ast, [String])
parseExpr ts
=  let (f1, ts1) = parseTerm ts
in  go f1 ts1
where go acc (op:ts2)
| op == "+" || op == "-"
= let (f2, ts3) = parseTerm ts2
in go ((astOp op) acc f2) ts3
go acc rest = (acc, rest)
astOp "+" = Plus
astOp "-" = Minus

这样,您就可以将parse指向正确的解析器:

parse :: String -> Ast
parse s =
case parseExpr (tokenize s) of
(e, []) -> e
(_, t : _) -> error ("unexpected token " ++ show t)

你的例子应该有效:

λ> parse "a - 2 * b + c"
Plus (Minus (Word "a") (Mult (Num 2) (Word "b"))) (Word "c")

请注意,这与您所说的输出略有不同,但这种排序对于左关联运算符是正确的(这对于正确处理-很重要(。也就是说,你想要:

5 - 4 + 1

解析为:

(5 - 4) + 1  -- i.e., (Plus (Minus (Num 5) (Num 4)) (Num 1))

从而评估者将计算出2的正确答案。如果您将其解析为:

5 - (4 + 1)  -- i.e., (Minus (Num 5) (Plus (Num 4) (Num 1)))

你的评估者会计算出错误的答案0。

但是,如果您真的想使用正确的关联运算符进行解析,请参阅下面的内容。

左联想运算符的完整修改代码:

data Ast
= Word String
| Num Int
| Mult Ast Ast
| Plus Ast Ast
| Minus Ast Ast
deriving (Eq, Show)
tokenize :: [Char] -> [String]
tokenize [] = []
tokenize (' ' : s) = tokenize s
tokenize ('-' : s) = "-" : tokenize s
tokenize ('+' : s) = "+" : tokenize s
tokenize ('*' : s) = "*" : tokenize s
tokenize (c : s)
| isDigit c =
let (cs, s') = collectWhile isDigit s
in (c : cs) : tokenize s'
| isAlpha c =
let (cs, s') = collectWhile isAlpha s
in (c : cs) : tokenize s'
| otherwise = error ("unexpected character " ++ show c)
collectWhile :: (Char -> Bool) -> String -> (String, String)
collectWhile p s = (takeWhile p s, dropWhile p s)
isDigit, isAlpha :: Char -> Bool
isDigit c = c `elem` ['0' .. '9']
isAlpha c = c `elem` ['a' .. 'z'] ++ ['A' .. 'Z']
parseFactor :: [String] -> (Ast, [String])
parseFactor (t : ts)
| isNumToken t = (Num (read t), ts)
| isWordToken t = (Word t, ts)
| otherwise = error ("unrecognized token " ++ show t)
parseFactor [] = error "unexpected end of input"
parseTerm :: [String] -> (Ast, [String])
parseTerm ts
=  let (f1, ts1) = parseFactor ts
in  go f1 ts1
where go acc ("*":ts2)
= let (f2, ts3) = parseFactor ts2
in go (Mult acc f2) ts3
go acc rest = (acc, rest)
parseExpr :: [String] -> (Ast, [String])
parseExpr ts
=  let (f1, ts1) = parseTerm ts
in  go f1 ts1
where go acc (op:ts2)
| op == "+" || op == "-"
= let (f2, ts3) = parseTerm ts2
in go ((astOp op) acc f2) ts3
go acc rest = (acc, rest)
astOp "+" = Plus
astOp "-" = Minus
isNumToken, isWordToken :: String -> Bool
isNumToken xs = takeWhile isDigit xs == xs
isWordToken xs = takeWhile isAlpha xs == xs
parse :: String -> Ast
parse s =
case parseExpr (tokenize s) of
(e, []) -> e
(_, t : _) -> error ("unexpected token " ++ show t)

对于正确的关联运算符,请修改以下定义:

parseTerm :: [String] -> (Ast, [String])
parseTerm ts
=  let (fct, ts1) = parseFactor ts
in  case ts1 of
"*":ts2 -> let (trm, rest) = parseTerm ts2
in  (Mult fct trm, rest)
_       -> (fct, ts1)
parseExpr :: [String] -> (Ast, [String])
parseExpr ts
=  let (trm, ts1) = parseTerm ts
in  case ts1 of
op:ts2 | op == "+" || op == "-"
-> let (expr, rest) = parseExpr ts2
in  (astOp op trm expr, rest)
_       -> (trm, ts1)
where astOp "+" = Plus
astOp "-" = Minus*

相关内容

  • 没有找到相关文章

最新更新