通过更改字体嵌入来减小使用 matplotlib 创建的 PDF 的文件大小

我正在使用matplotlib来生成PDF图形。但是，即使是最简单的数字也会生成相对较大的文件，下面的 MWE 生成的文件几乎为 1 MB。我已经意识到大文件大小是由于 matplotlib 完全嵌入了所有使用的字体。由于我将制作相当多的绘图并希望减小文件大小，因此我想知道：

主要问题：

有没有办法让 matplotlib 嵌入字体子集而不是完整的字体？我也可以完全不包括字体。

到目前为止考虑的事情：

矢量图形编辑器可以很容易地用于导出包含字体子集(以及根本不包括字体(的PDF，但是必须为每个文件(修订版(执行此步骤似乎不必要地乏味。
同样，我也读过关于后处理PDF文件(例如使用Ghostscript(的文章，尽管这种努力似乎是可比的。
我尝试设置"pdf.fonttype"= 3，这确实会产生相当小的文件。但是，我想在矢量图形编辑器中保持文本可修改 - 在这种情况下似乎不起作用(例如减号不会保存为文本(。

由于使用外部软件生成带有嵌入式子集的文件很容易(尽管是劳动密集型的(，因此是否可以以某种方式直接在 matplotlib 中实现这一点？任何帮助将不胜感激。

兆威

import matplotlib.pyplot as plt #Setup
import matplotlib as mpl
mpl.rcParams['pdf.fonttype'] = 42
mpl.rcParams['mathtext.fontset'] = 'dejavuserif'
mpl.rc('font',family='Arial',size=12)
fig,ax=plt.subplots(figsize=(2,2)) #Create a figure containing some text
ax.semilogy(1,1,'s',label='Textn$M_mathrm{ath}$')
ax.legend()
fig.tight_layout()
fig.savefig('test.pdf')

环境：matplotlib 3.1.1

PGF 后端有助于显著减小 PDF 文件的大小。只需将mpl.use('pgf')添加到代码中即可。在我的环境中，此修正会导致以下内容：

文件大小从 817K 减小到 21K(小 40 倍！
执行时间从 1 秒增加到 3 秒。

但是，对于实际数字，执行时间通常会随着文件大小的增加而减少。

PDF 大小的减小归因于嵌入了字体的子集。

$ pdffonts pdf_backend.pdf
name                         type              emb sub uni prob object ID
---------------------------- ----------------- --- --- --- ---- ---------
ArialMT                      CID TrueType      yes no  yes          14  0
DejaVuSerif-Italic           CID TrueType      yes no  yes          23  0
DejaVuSerif                  CID TrueType      yes no  yes          32  0

$ pdffonts pgf_backend.pdf
name                         type              emb sub uni prob object ID
---------------------------- ----------------- --- --- --- ---- ---------
KECVVY+ArialMT               CID TrueType      yes yes yes           7  0
EFAAMX+CMR12                 Type 1C           yes yes yes           8  0
EHYQVR+CMSY8                 Type 1C           yes yes yes           9  0
UVNOSL+CMR8                  Type 1C           yes yes yes          10  0
FDPQQI+CMMI12                Type 1C           yes yes yes          11  0
DGIYWD+DejaVuSerif           CID TrueType      yes yes yes          13  0

另一种选择是生成EPS文件(使用PostScript后端(并将其转换为PDF格式，例如，通过epstopdf(使用GhostScript解释器(。这样将PDF文件减少到9K。但是，值得注意的是，PS后端不支持透明度。

把它留在这里，以防其他人可能正在寻找类似的东西：毕竟，我决定选择Ghostscript。由于额外的步骤，它并不完全是我想要的，但至少它可以自动化：

import subprocess
def gs_opt(filename):
filenameTmp = filename.split('.')[-2]+'_tmp.pdf'
gs = ['gswin64',
'-sDEVICE=pdfwrite',
'-dEmbedAllFonts=false',
'-dSubsetFonts=true',             # Create font subsets (default)
'-dPDFSETTINGS=/prepress',        # Image resolution
'-dDetectDuplicateImages=true',   # Embeds images used multiple times only once
'-dCompressFonts=true',           # Compress fonts in the output (default)
'-dNOPAUSE',                      # No pause after each image
'-dQUIET',                        # Suppress output
'-dBATCH',                        # Automatically exit
'-sOutputFile='+filenameTmp,      # Save to temporary output
filename]                         # Input file
subprocess.run(gs)                                      # Create temporary file
subprocess.run(['del', filename],shell=True)            # Delete input file
subprocess.run(['ren',filenameTmp,filename],shell=True) # Rename temporary to input file

然后打电话

filename = 'test.pdf'
plt.savefig(filename)
gs_opt(filename)

这会将图形保存为 test.pdf，使用 Ghostscript 创建一个临时的优化test_tmp.pdf，删除初始文件并将优化的文件重命名为 test.pdf。

与使用矢量图形编辑器导出文件相比，Ghostscript 创建的生成的 PDF 仍然大几倍(通常为 4-5 倍(。但是，它将文件大小减小到初始文件的 1/5 到 1/10 之间。这是什么东西。

相关内容

最新更新

热门标签：