如何使用 Pandoc 和 LaTeX 从 docx 转换为 pdf 时"missing character"警告？

目标

我有几千个高棉语的.docx文件，我想用Pandoc把它们转换成.pdf格式。

背景

我使用MacPorts安装了Pandoc。Pandoc需要LaTeX进行PDF转换，所以我安装了MacTeX。安装似乎进行得很顺利，我可以毫不费力地将英语.docx文件转换为.pdf。

尝试1

当我尝试转换高棉语文件时(您可以在https://briancroxall.net/pandoc/transcription.docx)对于PDF，我使用以下命令：

pandoc transcription.docx  -s -o transcript.pdf

我收到以下错误：

Error producing PDF.
! Package inputenc Error: Unicode character អ (U+17A2)
(inputenc)                not set up for use with LaTeX.
See the inputenc package documentation for explanation.
Type  H <return>  for immediate help.
...                                              
l.64 ...�នៅសម័យប៉ុល ពត។}
Try running pandoc with --pdf-engine=xelatex.

尝试2

按照这个建议，我使用这个命令：

pandoc --pdf-engine=xelatex transcription.docx  -s -o transcript.pdf

Pandoc然后为文本中的每个高棉字符抛出一条错误消息：

[WARNING] Missing character: There is no អ in font [lmroman10-bold]:mapping=tex-text;!
[WARNING] Missing character: There is no ្ in font [lmroman10-bold]:mapping=tex-text;!
[WARNING] Missing character: There is no ន in font [lmroman10-bold]:mapping=tex-text;!
...

通过此过程生成PDF(请参阅https://briancroxall.net/pandoc/transcript.pdf)，但它基本上是空的。

问题

据我所知，这表明我试图用来进行转换的LaTeX引擎中没有高棉语字符。不管是不是这样，我如何才能成功地管理此文件转换？

mb21的评论帮助我解决了这个问题。由于我的系统安装了几个高棉字体，我不得不设置mainfont来使用其中一个。

$ pandoc --pdf-engine=xelatex transcription.docx  
-V 'mainfont:Khmer MN' -s -o transcription.pdf

这将生成一个包含高棉字符的PDF，并且没有错误消息。

PDF确实似乎有一些问题，因为高棉语的一些短语从页面边缘脱落。我认为这是由于Word能够处理的分割问题，但在转换为PDF时会出现问题。

目标

背景

尝试1

尝试2

问题

相关内容

最新更新

热门标签：