使用 Haskell monad "do"表示法定义语法树

我正在尝试构造一个抽象语法树，允许使用monaddo符号进行定义，如下所示：

ast = do
Variable uint8 "i"
Function Void "f" $ do
Variable uint8 "local_y"
Comment "etc. etc."

我在这里展示的结构是从Text.Blaze.Html收集的，它用于定义HTML树。

问题分散在以下内容中。主要问题是如何正确执行此操作。当然，任何有助于理解此结构的输入都非常感谢。

所以，首先，这里有一个小的，有缺陷的，但"有效"的例子。它是一个语法树，其中包含特定类型的变量和函数的声明、注释行以及用于替换的占位符声明：

{-# LANGUAGE ExistentialQuantification #-}
module Question
where
import           Control.Applicative
import           Data.Monoid         (Monoid, (<>))
import           Data.String.Utils   (rstrip)
type NumberOfBits = Word
type VariableName = String
data Type = UInt NumberOfBits
| Int NumberOfBits
| Void
uint8 = UInt 8
int8 = Int 8
instance Show Type where
show (UInt w) = "uint" <> show w
show (Int w)  = "int" <> show w
show Void     = "void"
data TreeM a = Variable Type VariableName            -- variable declaration
| Function Type VariableName (TreeM a)  -- function declaration
| Comment String                        -- a comment
| PlaceHolder String                    -- a placeholder with                  
| forall b. Append (TreeM b) (TreeM a)  -- combiner
| Empty a                               -- needed for what?
type Tree = TreeM ()
subTreeOf :: TreeM a -> a
subTreeOf (Variable _ _)   = undefined
subTreeOf (Function _ _ t) = subTreeOf t
subTreeOf (Comment _)      = undefined
subTreeOf (Empty t)        = t
instance Monoid a => Monoid (TreeM a) where
mempty = Empty mempty
mappend = Append
mconcat = foldr Append mempty
instance Functor TreeM where
fmap f x = x `Append` (Empty (f (subTreeOf x))) -- fmap :: (a -> b) -> f a -> f b
instance Applicative TreeM where
pure x = Empty x
(<*>) x y = (x `Append` y) `Append` (Empty (subTreeOf x (subTreeOf y)))  -- (<*>) :: f (a -> b) -> f a -> f b
(*>) = Append
instance Monad TreeM where
return x = Empty x
(>>) = Append             -- not really needed: (>>) would default to (*>)
t >>= f = t `Append` (f (subTreeOf t))
indent :: String -> String
indent s = rstrip $ unlines $ map ("    "<>) (lines s)
render :: TreeM a -> String
render (Variable y n)   = "Variable " <> (show y) <> " " <> show n
render (Function r n t) = "Function" <> " " <> n <> " returning " <> (show r) <> ":n" <> indent (render t)
render (PlaceHolder n)  = "Placeholder "" <> n <> """
render (Append t t')    = (render t) <> "n" <> (render t')
render (Empty _)        = ""
-- |In input tree t substitute a PlaceHolder of name n' with the Tree t'
sub :: TreeM a -> (String, TreeM a) -> TreeM a
sub t@(PlaceHolder n) (n', t') = if n == n' then t' else t
sub (Function y n t) s         = Function y n (sub t s)
--sub (Append t t') s            = Append (sub t s) (sub t' s)  -- Error!
sub t _                        = t
code :: Tree
code = do
Variable uint8 "i"
Variable int8 "j"
Function Void "f" $ do
Comment "my function f"
Variable int8 "i1"
Variable int8 "i2"
PlaceHolder "the_rest"
main :: IO ()
main = do
putStrLn $ render code
putStrLn "nNow apply substitution:n"
putStrLn $ render (sub code ("the_rest", Comment "There is nothing here"))

这是(应该是)定义复杂树结构的一种非常简洁的方法。特别是，这应该是语法上噪声最小、用户友好的定义语法树的方法。

总的来说，我很难理解TreeM a中a的确切含义.在我看来，a可以是任何类型Variable、Function、PlaceHolder等。

我注意到一些让我感到奇怪的事情：

在forall b. Append (TreeM b) (TreeM a)，TreeM a和TreeM bAppend论证的顺序似乎颠倒了。无论如何，在总和类型中使用存在量词看起来很奇怪。如果我理解正确，它为TreeM定义了一系列构造函数。
在Functor、Applicative和Monad所需的所有函数中，唯一实际使用的函数是monad>>。(这表明自由的monad可能是这项工作的正确工具。实际上，我从未想过do符号使用了>>运算符，并且可以使用这一事实。
必须在subTreeOf中使用undefined才能使函数总计。

如前所述，上面的示例存在缺陷：构造的某些部分不适合 AST：

Empty的定义对HTML树有意义，它用于空标签，如<br />。但对于 AST 来说，这毫无意义。它保持原样以保持Applicative和Functor实现正常工作。
同样，Functor和Applicative的实现可能对 HTML 树有意义，但对 AST 没有意义。即使对于 HTML，我也不太了解fmap和应用<*>的目的。两者都通过向下推送节点并添加Empty类型来扩展树。我不太明白 HTML 树上的哪个自然转换代表。

我很惊讶应用<*>定义中的subTreeOf x (subTreeOf y)实际上是正确的语法，还是隐含的>>？

AST 转换

在 AST 上应用转换是很自然的。PlaceHolder用作应用变换的小玩具。函数sub，这里只有部分实现，应该用注释替换占位符"the_rest"。必要的但是sub (Append t t') s = Append (sub t s) (sub t' s)不编译，预期的s类型是(String, TreeM b)，实际类型是(String, TreeM a)。将类型更改为另一方面，sub :: TreeM a -> (String, TreeM b) -> TreeM a打破了sub p@(PlaceHolder n)的定义，现在我陷入了困境。

事实上，这不正是ASTfmap应该sub吗？

免费单子？

当讨论AST的monad时，术语"自由monad"经常出现。但是自由 monad 依赖于自由构造的Functorfmap，这里显示的fmap对于 AST，是不够的。一旦确定了正确的fmap，自由monad应该完成剩下的工作 - 也许吧。

`fmap`

看来，正确的fmap是这里成功的关键，正确的<*>可能会变得更加明显。

用例

循环可以用forM_编写，这是构建AST重复部分的好方法：

forM_ ["you", "get", "the", "idea"] $ varName -> do
Variable uint8 varName

条件部分可以使用when、unless等。

when hasCppDestructor $ do
Comment "We need the destructor"
Function NoReturnType "~SomeClass" $ do
...

语义分析，例如确保正确的声明顺序，也是可能的，正如第一个答案所指出的那样。

视觉线索：我喜欢的另一件事是，在上面显示的构造中，控制结构(例如 if-then-else、forM_等)以小写形式开头，而 AST 行以大写形式开头。

背景

关于它的发展方向，可能的话是：这个想法是使用一个足够好的嵌入式DSL，它允许自动定义一个AST，它相当抽象地表示，比如说，一个复杂的FSM，需要在C，C++，Python，Java，Go，Rust，Javascript中实现，等等......然后，像上面这样的render函数将可验证的正确 AST 映射到目标语言。

更新

请注意，>>不默认为*>，而是默认为m >> k = m >>= (_ -> k)！

我不确定这整个方法是否是一个好主意(尽管，我自己实际上已经做过很多次类似的事情)。

请注意，像Blaze.MarkupM、HaTeX.LaTeXM等单子并不是真正的单子。它们实际上只是想要访问一元运算器的幺半群(主要是滥用do符号，但它也允许堆栈单元变压器在顶部，这可能很有意义)。也就是说，它们只不过是专门的Writer单子！
目前，你真的在做同样的事情;如果这是你的意图，那么也许最好的方法是将你的类型设计为Monoid Tree，然后查看Writer Treemonad的结构，如果需要，将其重构为TreeM数据结构。(HaTeX不这样做，而是只使用一个通用的类接口来保持LaTeX和LaTeXM单独的类型，这可以说是一种更简洁的方法，尽管它可能不是性能最佳的。

结果将非常像Blaze.MarkupM/您现在拥有的结构。我可以讨论你的个人问题，但实际上，它们都可以通过查看类型如何与作家 monad 同构来回答。

实际上，您根本不需要Monad实例即可使用do，如下所示：

Prelude> 2 * do 1 + 1
4

因此，如果您只是想滥用do以避免在树布局中使用括号，但实际上没有在结构中存储可绑定变量的明智方法，请考虑不要编写任何 monad 实例。该实例仅适用于具有多行的do块，但是如果这些行都没有绑定任何变量，那么您始终可以将隐式>>替换为显式<>，例如

Function Void "f" $ do
Variable uint8 "local_y"
<> Comment "etc. etc."

唯一的问题是：这些行不能包含$运算符，因为它的优先级低于<>。规避这种情况的一种巧妙方法是观察($) = id，因此您可以将示例编写为

ast = do
Variable uint8 "i"
<> Function Void "f" `id`do
Variable uint8 "local_y"
<> Comment "etc. etc."

这是否比定义一个不多的monad实例更滥用语法是值得商榷的。IMO，如果您定义这样的 monad 实例，您应该立即使其成为monad 转换器，就像HaTeX所做的那样，因为这也提供了允许在 AST 构建中包含IO操作的选项(例如，硬包含外部源文件)。

综上所述：对于您的应用程序，拥有一个Monad实例实际上可能是有意义的，该实例不仅仅是一个"加糖的幺半群"，而且实际上以一种有用的方式绑定变量。这是一个不适用于blaze的功能，但肯定适用于像AST这样的C++/Python/JavaScript语言，它可能非常有用，因为它确保变量在使用前被定义，就在Haskell语法中。而不是你的例子，你会写

ast = do
i <- variable uint8
Function Void "f" $ do
local_y <- variable uint8
Comment "etc. etc."

然后，这些变量实际上只是根据状态变量选择的编号标识符。

实现大致如下：

type VariableName = Int
data TreeS = Variable Type VariableName
| Function Type VariableName TreeS
| Comment String
| PlaceHolder String
| Append TreeS TreeS
| Empty
instance Monoid where (<>) = Append
newtype TreeT m a
= TreeT { runTreeM :: StateT VariableName (WriterT TreeS m) a }
deriving (Functor, Applicative, Monad)
variable :: Type -> TreeT m VariableName
variable typ = TreeT $ do
i <- get
lift . tell $ Variable typ i
put $ i+1
return i

我对 AST 的这种Append编码所采取的路径似乎是一个死胡同，所以我更深入地挖掘了自由 monads。结果如下：

自由的monad非常适合这种类型的问题。自由monads允许将程序的"逻辑"与其效果分开。AST 属于这种模式。在这个例子中，"逻辑"是AST，效果只是漂亮的打印。

更一般地说，"效果"可以意味着分析，测试(例如试运行)，运行校样，漂亮的打印，压缩......，当然还有实际执行。

关于免费monads已经写了很多，这里有一些有用的资源可以开始：

加布里埃尔·冈萨雷斯(Gabriel Gonzalez)的《为什么自由的monads很重要》
"什么是"免费Monad + Interpreter"模式？"(SE 问题)
"导航和修改基于哈斯克尔自由单体构建的 AST" (SE 问题)

现在，使用Control.Monad.Free解决方案将如下所示：

{-# LANGUAGE DeriveFunctor #-}
module Main where
import           Control.Monad.Free
import           Data.Monoid        ((<>))
import           Data.String.Utils  (rstrip)
type NumberOfBits = Word
type VariableName = String
data Type = UInt NumberOfBits
| Int NumberOfBits
| Void
deriving Eq
uint8 = UInt 8
int8 = Int 8
instance Show Type where
show (UInt w) = "uint" <> show w
show (Int w)  = "int" <> show w
show Void     = "void"
data AST n = Variable Type VariableName n                -- variable declaration
| Function Type VariableName (Free AST ()) n -- function declaration
| Comment String n                           -- a comment
| PlaceHolder String n                       -- a placeholder with @name holds holds more code
| End                                        
deriving (Eq, Show, Functor)
end :: Free AST ()
end = liftF End -- is exactly Pure ()
variable :: Type -> VariableName -> Free AST ()
variable y n = liftF (Variable y n ())
function :: Type -> VariableName -> Free AST () -> Free AST ()
function t n p = liftF (Function t n p ())
placeHolder :: String -> Free AST ()
placeHolder n = liftF (PlaceHolder n ())
comment :: String -> Free AST ()
comment c = liftF (Comment c ())
indent :: String -> String
indent s = rstrip $ unlines $ map ("    "<>) (lines s)
render :: Free AST r -> String
render (Free (Variable y n next))   = "Variable " <> show y <> " " <> show n <> "n" <> render next
render (Free (Function t n f next)) = "Function "" <> n <> "" returning " <> show t <> ":n"
<> indent (render f) <> "n" <> render next
render (Free (Comment c next))      = "//  " <> c <> "n" <> render next
render (Free (PlaceHolder s next))  = "PlaceHolder "" <> s <> ""n" <> render next
render (Free End)                   = "end"
render (Pure r)                     = "returnn"
code :: Free AST ()
code = do
placeHolder "includefiles"
variable uint8 "i"
variable int8 "j"
function Void "f" $ do
comment "This is a function!"
variable (Int 8) "local_i"
sub :: AST (Free AST b) -> Free AST b
sub (Variable t n next) = do
variable t n
next
sub (Function t n f next) = do
function t n f
next
sub (Comment c next) = do
comment c
next
sub (PlaceHolder s next) = do
comment "placeholder"
next
main :: IO ()
main = do
putStrLn $ render code
putStrLn "-- Apply substn"
putStrLn $ render (iterM sub code)

并非所有这些都需要如此明确地阐明。可以使用Control.Monad.Free.TH删除某些样板。

从某种意义上说，Control.Monad.Free是规范实现，但链式数据结构意味着某些操作的二次复杂性。作者本人Ed Kmett在Control.Monad.Free.Church中解决了这个问题，其中使用了不同的编码。请参阅免费的 monad 基准测试，了解基准测试和其他免费 monad 实现的指针。

超越自由单子，共自由单子形式化解释器及其与"逻辑"的关系。例如，参见David Laing的"免费DSL，口译员免费"。

AST 转换

免费单子？

`fmap`

用例

背景

更新

相关内容

最新更新

热门标签：