在Attoparsec中使用sepBy字符串

我试图通过",", ", and"和"and"分开字符串，然后返回两者之间的任何内容。到目前为止，我所拥有的一个例子如下:

import Data.Attoparsec.Text
sepTestParser = nameSep ((takeWhile1 $ inClass "-'a-zA-Z") <* space)
nameSep p = p `sepBy` (string " and " <|> string ", and" <|> ", ")
main = do
  print $ parseOnly sepTestParser "This test and that test, this test particularly."

我希望输出为["This test", "that test", "this test particularly."]。我有一种模糊的感觉，我所做的是错的，但我不太明白为什么。

^{注意:这个答案是用读写Haskell写的。保存为Example.lhs，并在GHCi或类似的格式中加载。}

问题是，sepBy被实现为:

sepBy p s = liftA2 (:) p ((s *> sepBy1 p s) <|> pure []) <|> pure []

这意味着第二个解析器s将在第一个解析器成功后被调用。这也意味着，如果要在字符类中添加空格，那么最终会得到

。

["This test and that test","this test particularly"]

因为and现在可以被p解析。这并不容易修复:您需要在键入空格后立即查看，并检查在任意数量的空格之后是否后跟"and"，如果是，则停止解析。只有然后用sepBy编写的解析器才能工作。

所以让我们写一个解析器来接受单词(这个答案的其余部分是读写Haskell):

> {-# LANGUAGE OverloadedStrings #-}
> import Control.Applicative
> import Data.Attoparsec.Text
> import qualified Data.Text as T
> import Control.Monad (mzero)
> word = takeWhile1 . inClass $ "-'a-zA-Z"
> 
> wordsP = fmap (T.intercalate " ") $ k `sepBy` many space
>   where k = do
>           a <- word
>           if (a == "and") then mzero
>                           else return a

wordsP现在接受多个单词，直到它碰到一些东西，那不是一个单词，或者一个等于"one_answers"的单词。返回的mzero将表示解析失败，此时另一个解析器可以接管:

> andP = many space *> "and" *> many1 space *> pure()
> 
> limiter = choice [
>     "," *> andP,
>     "," *> many1 space *> pure (),
>     andP
>   ]

limiter基本上是你已经写过的相同的解析器，它与正则表达式/,s+and|,s+|s*ands+/相同。

现在我们可以实际使用sepBy，因为我们的第一个解析器不再与第二个解析器重叠:

> test = "This test and that test, this test particular, and even that test"
>
> main = print $ parseOnly (wordsP `sepBy` limiter) test

结果是["This test","that test","this test particular","even that test"]，正如我们想要的。注意，这个特殊的解析器不保留空格。

因此，无论何时您想使用sepBy创建解析器，请确保两个解析器不重叠。

相关内容

最新更新

热门标签：