参考:http://git.macropus.org/2011/11/pdftotext/example/
在这个项目中,开发人员将pdf作为输入,并将其传递给变量"input"我想创建一个上传菜单/dropzone,这样任何人都可以上传他们的pdf文件,它会自动传递到变量"input",并且可以提取文本我可以上传文件,但不知道如何将pdf传递给变量"input"
<body>
<form id="upload" method="post" action="upload.php" enctype="multipart/form-data">
<div id="drop">
Drop Here
<a>Browse</a>
<input id="inputx" src="./"type="file" name="upl" multiple />
</div>
<ul>
<!-- The file uploads will be shown here -->
</ul>
</form>
现在使用这个表单,pdf将被上传,现在我们必须将变量"input"传递给它。
<script>
var input = document.getElementById("input");
var processor = document.getElementById("processor");
var output = document.getElementById("output");
window.addEventListener("message", function(event){
if (event.source != processor.contentWindow) return;
switch (event.data){
case "ready":
var xhr = new XMLHttpRequest;
xhr.open('GET', input.getAttribute("src"), true);
xhr.responseType = "arraybuffer";
xhr.onload = function(event) {
processor.contentWindow.postMessage(this.response, "*");
};
xhr.send();
break;
default:
output.innerHTML = event.data.replace(/s+/g, " ");
break;
}
}, true);
</script>
</body>
您只需要将Pdf.js指向已上传文件的副本。
在上面的代码中,Pdf.js通过XMLHttpRequest获取数据,其中它查找一个.Pdf,该.Pdf的文件名定义为ID为input
:的元素的src
属性
xhr.open('GET', input.getAttribute("src"), true);
如果将此元素的src
属性设置为已上载到服务器的pdf的文件路径,则脚本应按原样运行。
这里有一些代码可能会对您有所帮助——index.html
是一个简单的文件上传表单,它调用PHP将文件上传到提供文件(index.html
)的同一目录中。file_upload.php
保存上传的文件,并在iframe上设置src
属性的值,行如下:
<iframe id="input" src= <?php print $_FILES['userfile']['name'] ?> ></iframe>
index.html
<html>
<head>
<title>Converting PDF To Text using pdf.js</title>
</head>
<body>
<div>
<!-- the PDF file must be on the same domain as this page -->
<form enctype="multipart/form-data" action="file_upload.php" method="POST">
<input id="fileInput" type="file" name="userfile"></input>
<input type="submit" value="Submit">
</form>
</div>
</body>
</html>
file_upload.php
<?php
$uploadfile = basename($_FILES['userfile']['name']);
echo '<pre>';
if (move_uploaded_file($_FILES['userfile']['tmp_name'], $uploadfile)) {
echo "File is valid, and was successfully uploaded.n";
} else {
echo "Possible file upload attack!n";
}
echo 'Here is some more debugging info:';
print_r($_FILES);
print "</pre>";
?>
<html>
<head>
<title>Converting PDF To Text using pdf.js</title>
<style>
html, body { width: 100%; height: 100%; overflow-y: hidden; padding: 0; margin: 0; }
body { font: 13px Helvetica,sans-serif; }
body > div { width: 48%; height: 100%; overflow-y: auto; display: inline-block; vertical-align: top; }
iframe { border: none; width: 100%; height: 100%; }
#output { padding: 10px; box-shadow: 0 0 5px #777; border-radius: 5px; margin: 10px; }
#processor { height: 70px; }
</style>
</head>
<div>
<!-- embed the pdftotext web app as an iframe -->
<iframe id="processor" src="../"></iframe>
<!-- a container for the output -->
<div id="output"><div id="intro">Extracting text from a PDF file using only Javascript.<br>Tested in Chrome 16 and Firefox 9.</div></div>
</div>
<div>
<iframe id="input" src= <?php print $_FILES['userfile']['name'] ?> ></iframe>
</div>
<script>
var input = document.getElementById("input");
var processor = document.getElementById("processor");
var output = document.getElementById("output");
window.addEventListener("message", function(event){
if (event.source != processor.contentWindow) return;
switch (event.data){
case "ready":
var xhr = new XMLHttpRequest;
xhr.open('GET', input.getAttribute("src"), true);
xhr.responseType = "arraybuffer";
xhr.onload = function(event) {
processor.contentWindow.postMessage(this.response, "*");
};
xhr.send();
break;
default:
output.textContent = event.data.replace(/s+/g, " ");
break;
}
}, true);
</script>
</body>
</html>