Pandoc是否能够为任何元素注入任意HTML属性



因此,代码块可以使用fenced_code_blocks扩展名定义HTML属性:

~~~~ {#mycode .haskell .numberLines startFrom="100"}
qsort []     = []
qsort (x:xs) = qsort (filter (< x) xs) ++ [x] ++
               qsort (filter (>= x) xs)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

在某种程度上,是否可以将上述语法用于常规文本块?例如,我想转换以下Markdown文本:

# My header
~~~ {.text}
This is regular text. This is regular text.
~~~
~~~ {.quote}
> This is the first level of quoting.
>
> > This is nested blockquote.
>
> Back to the first level.
~~~
~~~ {data-id=test-123}
+   Red
+   Green
+   Blue
~~~

变成这样:

<h1 id="my-header">My header</h1>
<p class="text">This is regular text. This is regular text.</p>
<blockquote class="quote">
<p>This is the first level of quoting.</p>
<blockquote>
<p>This is nested blockquote.</p>
</blockquote>
<p>Back to the first level.</p>
</blockquote>
<ul data-id="test-123">
<li>Red</li>
<li>Green</li>
<li>Blue</li>
</ul>

如果Pandoc本身没有这样的支持,那么有可能在Lua创建一个这样的定制作家吗?

编辑:看看sample.lua自定义编写器,有人知道第35行的"属性表"是什么吗?如何将这些属性传递给特定的Pandoc元素?此外,我在上面寻找的功能与header_extension扩展非常相似,只是它适用于所有元素,而不仅仅是标头。

Pandoc的过滤器允许您对文档的Pandoc内部表示进行操作。可以有一个过滤器链来进行不同的转换。我将分享两个有帮助的过滤器示例。

Markdown代码块

Pandoc中的代码块通常用于嵌入编程语言中的源代码列表,但在这里,我们试图提取主体并将其解释为markdown。与其使用输入文档中的类(如textquote),不如使用一个通用的as-markdown类。Pandoc将自动生成适当的标签。

# My header
~~~ {.as-markdown}
This is regular text. This is regular text.
~~~
~~~ {.as-markdown}
> This is the first level of quoting.
>
> > This is nested blockquote.
>
> Back to the first level.
~~~
~~~ {.as-markdown data-id=test-123}
+   Red
+   Green
+   Blue
~~~
~~~ haskell
main :: IO ()
~~~

为了确保没有as-markdown类的代码块能够像往常一样被解释,我包含了一个haskell代码块。以下是过滤器实现:

#!/usr/bin/env runhaskell
import Text.Pandoc.Definition       (Pandoc(..), Block(..), Format(..))
import Text.Pandoc.Error            (handleError)
import Text.Pandoc.JSON             (toJSONFilter)
import Text.Pandoc.Options          (def)
import Text.Pandoc.Readers.Markdown (readMarkdown)
asMarkdown :: String -> [Block]
asMarkdown contents =
  case handleError $ readMarkdown def contents of
    Pandoc _ blocks -> blocks
-- | Unwrap each CodeBlock with the "as-markdown" class, interpreting
-- its contents as Markdown.
markdownCodeBlock :: Maybe Format -> Block -> IO [Block]
markdownCodeBlock _ cb@(CodeBlock (_id, classes, _namevals) contents) =
  if "as-markdown" `elem` classes then
    return $ asMarkdown contents
  else
    return [cb]
markdownCodeBlock _ x = return [x]
main :: IO ()
main = toJSONFilter markdownCodeBlock

运行pandoc --filter markdown-code-block.hs index.md生成:

<h1 id="my-header">My header</h1>
<p>This is regular text. This is regular text.</p>
<blockquote>
<p>This is the first level of quoting.</p>
<blockquote>
<p>This is nested blockquote.</p>
</blockquote>
<p>Back to the first level.</p>
</blockquote>
<ul>
<li>Red</li>
<li>Green</li>
<li>Blue</li>
</ul>
<div class="sourceCode"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span class="ot">main ::</span> <span class="dt">IO</span> ()</code></pre></div>

快到了!唯一不太正确的部分是HTML属性。

代码块元数据中的自定义HTML属性

下面的筛选器应该可以帮助您入门。当目标格式为htmlhtml5时,它将具有web-script类的代码块转换为HTML <script>标记。

#!/usr/bin/env runhaskell
import Text.Pandoc.Builder
import Text.Pandoc.JSON
webFormats :: [String]
webFormats =
  [ "html"
  , "html5"
  ]
script :: String -> Block
script src = Para $ toList $ rawInline "html" ("<script type='application/javascript'>" <> src <> "</script>")
injectScript :: Maybe Format -> Block -> IO Block
injectScript (Just (Format format)) cb@(CodeBlock (_id, classes, _namevals) contents) =
  if "web-script" `elem` classes then
    if format `elem` webFormats then
      return $ script contents
    else
      return Null
  else
    return cb
injectScript _ x = return x
main :: IO ()
main = toJSONFilter injectScript

最后一个块中的data-id=test-123将出现在类型为[(String, String)]_namevals的键值对中。您所需要做的就是重构script以支持HTML属性的任意标记和键值对,并指定基于这些输入生成什么HTML。要查看输入文档的本机表示,请运行pandoc -t native index.md

[Header 1 ("my-header",[],[]) [Str "My",Space,Str "header"]
,CodeBlock ("",["as-markdown"],[]) "This is regular text. This is regular text."
,CodeBlock ("",["as-markdown"],[]) "> This is the first level of quoting.n>n> > This is nested blockquote.n>n> Back to the first level."
,CodeBlock ("",["as-markdown"],[("data-id","test-123")]) "+   Redn+   Greenn+   Blue"
,Para [Str "To",Space,Str "ensure",Space,Str "regular",Space,Str "code",Space,Str "blocks",Space,Str "work",Space,Str "as",Space,Str "usual."]
,CodeBlock ("",["haskell"],[]) "main :: IO ()"]

如果你想试试这两个例子中的任何一个,它们都在我的pandoc-experiments存储库中。

这在kramdown中是非常可行的,它将转换以下输入

# My header
This is regular text. This is regular text.
{: .text}
> This is the first level of quoting.
>
> > This is nested blockquote.
>
> Back to the first level.
{: .quote}
+   Red
+   Green
+   Blue
{: data-id="test-123"}

<h1 id="my-header">My header</h1>
<p class="text">This is regular text. This is regular text.</p>
<blockquote class="quote">
  <p>This is the first level of quoting.</p>
  <blockquote>
    <p>This is nested blockquote.</p>
  </blockquote>
  <p>Back to the first level.</p>
</blockquote>
<ul data-id="test-123">
  <li>Red</li>
  <li>Green</li>
  <li>Blue</li>
</ul>

有关详细信息,请参阅语法的属性列表定义部分。

最新更新