拆分PDF基于文本使用vba acrobat api



我正在尝试拆分pdf,基于它找到的页面"。pdf"然而,当我尝试用动态字符串变量保存pdf时,它不保存pdf,但当我编写硬编码文件路径时,它输出pdf。我不知道这是怎么回事。

下面的代码还没有完成,我被困在用删除的页面创建新的pdf:

Function Extract_PDF()

Dim aApp As Acrobat.CAcroApp
Dim av_Doc As Acrobat.CAcroAVDoc
Dim pdf_Doc As Acrobat.CAcroPDDoc '
Dim newPDFdoc As Acrobat.CAcroPDDoc

Dim Sel_Text As Acrobat.CAcroPDTextSelect
Dim i As Long, j As Long
Dim pageNum, Content
Dim pageContent As Acrobat.CAcroHiliteList
Dim found As Boolean
Dim foundPage As Integer
Dim PDF_Path As String
Dim pdfName As String
Dim folerPath As String

Dim FileExplorer As FileDialog
Set FileExplorer = Application.FileDialog(msoFileDialogFilePicker)

With FileExplorer
.AllowMultiSelect = False
.InitialFileName = ActiveDocument.Path
.Filters.Clear
.Filters.Add "PDF File", "*.pdf"

If .Show = -1 Then
PDF_Path = .SelectedItems.Item(1)

Else
PagesLB = "Catch me Next Time ;)"
PDF_Path = ""
Exit Function
End If
End With

Set aApp = CreateObject("AcroExch.App")
Set av_Doc = CreateObject("AcroExch.AVDoc")

If av_Doc.Open(PDF_Path, vbNull) <> True Then Exit Function


While av_Doc Is Nothing
Set av_Doc = aApp.GetActiveDoc
Wend

av_Doc.BringToFront
aApp.Show

Set pdf_Doc = av_Doc.GetPDDoc

For i = pdf_Doc.GetNumPages - 1 To 0 Step -1

Set pageNum = pdf_Doc.AcquirePage(i)
Set pageContent = CreateObject("AcroExch.HiliteList")

If pageContent.Add(0, 9000) <> True Then Exit Function
Set Sel_Text = pageNum.CreatePageHilite(pageContent)

Content = ""
found = False

For j = 0 To Sel_Text.GetNumText - 1
Content = Content & Sel_Text.GetText(j)
If InStr(1, Content, ".pdf") > 0 Then
found = True
foundPage = i
pdfName = Content
Exit For
End If
Next j
If found Then

PDF_Path = Left(PDF_Path, InStrRev(PDF_Path, "")) & ValidWBName(pdfName)

Set newPDFdoc = CreateObject("AcroExch.PDDoc")
Set newPDFdoc = av_Doc.GetPDDoc

If newPDFdoc.DeletePages(0, i - 1) = False Then
Debug.Print "Failed"
Else
Debug.Print "done"
End If

If newPDFdoc.Save(PDSaveFull, PDF_Path) = False Then

Debug.Print "Failed to save pdf "
Else
Debug.Print "Saved"
End If

newPDFdoc.Close

End If

Next i

av_Doc.Close False
aApp.Exit

Set av_Doc = Nothing
Set pdf_Doc = Nothing
Set aApp = Nothing

End Function

ValidWBName:

Function ValidWBName(agr As String) As String
Dim RegEx As Object
Set RegEx = CreateObject("VBScript.RegExp")
With RegEx
.Pattern = "[\/:*?""<>|]"
.Global = True
ValidWBName = .Replace(agr, "")
End With
End Function

在上面的函数中,当它找到单词PDF时,它尝试创建一个新的PDF实例并删除以前的页面。

If found Then

PDF_Path = Left(PDF_Path, InStrRev(PDF_Path, "")) & ValidWBName(pdfName)

Set newPDFdoc = CreateObject("AcroExch.PDDoc")
Set newPDFdoc = av_Doc.GetPDDoc

If newPDFdoc.DeletePages(0, i - 1) = False Then
Debug.Print "Failed"
Else
Debug.Print "done"
End If

If newPDFdoc.Save(PDSaveFull, PDF_Path) = False Then

Debug.Print "Failed to save pdf "
Else
Debug.Print "Saved"
End If

newPDFdoc.Close

End If

这一行"保存pdf"失败

If newPDFdoc.Save(PDSaveFull, PDF_Path) = False Then

但是当我写入硬编码路径时它会创建pdf

If newPDFdoc.Save(PDSaveFull, "C:UsersMBADesktopPDF Project 2Murdoch_Michael__Hilary_PIA_19.pdf") = False Then

罪魁祸首必须在ValidWBName()函数中,该函数没有处理有效文件名

中所有可能不允许的字符。因为vbCrchar是其中之一,你可以这样改变它:

Function ValidWBName(agr As String) As String
Dim RegEx As Object
Set RegEx = CreateObject("VBScript.RegExp")
With RegEx
.Pattern = "[\/:*?""<>|" & Chr(13) & "]"  ' <-- added vbCr 
.Global = True
ValidWBName = .Replace(agr, "")
End With
End Function

最新更新