如何将所有页面和附件从PDF提取到PNG

  • 本文关键字:PDF 提取 PNG c# .net pdf
  • 更新时间 :
  • 英文 :


我正试图在.NET中创建一个进程,将PDF及其所有页面和附件转换为PNG。我正在评估库,遇到了PDFiumSharp,但它对我不起作用。这是我的代码:

string Inputfile = "input.pdf";
string OutputFolder = "Output";
string fileName = Path.GetFileNameWithoutExtension(Inputfile);
using (PdfDocument doc = new PdfDocument(Inputfile))
{
for (int i = 0; i < doc.Pages.Count; i++)
{
var page = doc.Pages[i];
using (var bitmap = new PDFiumBitmap((int)page.Width, (int)page.Height, false))
{
page.Render(bitmap);
var targetFile = Path.Combine(OutputFolder, fileName + "_" + i + ".png");
bitmap.Save(targetFile);
}
}
}

当我运行此代码时,我会得到以下异常:

异常的屏幕截图

有人知道怎么解决这个问题吗?PDFiumSharp还支持提取PDF附件吗?如果没有,有人对如何实现我的目标有其他想法吗?

PDFium看起来不支持提取PDF附件。如果你想实现你的目标,那么你可以看看另一个库,它既支持提取PDF附件,也支持将PDF转换为PNG。

我是LEADTOOLS PDF SDK的员工,您可以通过以下2个nuget软件包试用:https://www.nuget.org/packages/Leadtools.Pdf/

https://www.nuget.org/packages/Leadtools.Document.Sdk/

以下是一些代码,可以将PDF+PDF中的所有附件转换为输出目录中的单独PNG:

SetLicense();
cache = new FileCache { CacheDirectory = "cache" };
List<LEADDocument> documents = new List<LEADDocument>();
if (!Directory.Exists(OutputDir))
Directory.CreateDirectory(OutputDir);
using var document = DocumentFactory.LoadFromFile("attachments.pdf", new LoadDocumentOptions { Cache = cache, LoadAttachmentsMode = DocumentLoadAttachmentsMode.AsAttachments });
if (document.Pages.Count > 0)
documents.Add(document);
foreach (var attachment in document.Attachments)
documents.Add(document.LoadDocumentAttachment(new LoadAttachmentOptions { AttachmentNumber = attachment.AttachmentNumber }));
ConvertDocuments(documents, RasterImageFormat.Png);

和ConvertDocuments方法:

static void ConvertDocuments(IEnumerable<LEADDocument> documents, RasterImageFormat imageFormat)
{
using var converter = new DocumentConverter();
using var ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.LEAD);
ocrEngine.Startup(null, null, null, null);
converter.SetOcrEngineInstance(ocrEngine, false);
converter.SetDocumentWriterInstance(new DocumentWriter());
foreach (var document in documents)
{
var name = string.IsNullOrEmpty(document.Name) ? "Attachment" : document.Name;
string outputFile = Path.Combine(OutputDir, $"{name}.{RasterCodecs.GetExtension(imageFormat)}");
int count = 1;
while (File.Exists(outputFile))
outputFile = Path.Combine(OutputDir, $"{name}({count++}).{RasterCodecs.GetExtension(imageFormat)}");
var jobData = new DocumentConverterJobData
{
Document = document,
Cache = cache,
DocumentFormat = DocumentFormat.User,
RasterImageFormat = imageFormat,
RasterImageBitsPerPixel = 0,
OutputDocumentFileName = outputFile,
};
var job = converter.Jobs.CreateJob(jobData);
converter.Jobs.RunJob(job);
}
}

最新更新