Installation¶
PreRequisite¶
PDF-toolbox is a JAVA program and developed with JAVA version 8. Only a runtime is needed. When creating a searchable PDF file the source is processed with the tesseract OCR engine. This is optional.
Download & Install¶
First download the code. The code is found on the VirtOrg website http://www.virtorg.org. On the mainpage there is a reference of the latest version of the program. Click on the link and a ZIP file is downloaded. After downloading unpack the zipfile.
download and install:
wget http://www.virtorg.org/files/PDF-toolbox/vtgPDF-toolbox-0.1.?-bin.zip
mkdir PDF-toolbox
cd PDF-toolbox
unzip ../vtgPDF-toolbox-0.1.?-bin.zip
cd pdf-toolbox-0.1.?
java -jar target/vtgPDF-toolbox-0.1.?.jar --version
If the version is presented then the code is working. It is posible that you see some loggin messages:
log4j:WARN No appenders could be found for logger (com.virtorg.pdf.ocr.ServiceOCR).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
These messages can be ignored for now.
Edit the pdf-toolbox script file and change the parameters which are needed for the use of tesseract. (options=”-Djna.library.path=/opt/local/lib -Dvtg.tessdata.path=/opt/local/share”)
pdf-toolbox:
#!/usr/bin/env bash
jarfile=/target/vtgPDF-toolbox-0.1.4-SNAPSHOT.jar
options="-Djna.library.path=/opt/local/lib -Dvtg.tessdata.path=/opt/local/share"
if [[ $0 =~ ^/ ]] ; then
# absolute path used
program=$(dirname $0)$jarfile;
else
# relative path used
program=`pwd`/$(dirname $0)$jarfile;
fi
echo java $options -jar $program $*
java $options -jar $program $*
Usage¶
usage:
usage: PDF-toolbox
list of all options and commands
-c,--createLogFile create a new log4j.properties
-D,--destfile <file> The destination PDF
-h,--help print this message
-L,--overlayfile <file> The overlay PDF
-o,--overlay Overlay the original PDF with a writingpaper
-O,--originalfile <file> The original PDF
-r,--replace replace the original file with the resultfile
-s,--ocr OCR the origanal picture or PDF to searchable PDF
-v,--version print program version
-V,--verbose be extra verbose
Have a lot of fun with this VirtOrg program.
usage: PDF-toolbox [[options]] command [[parameters]]
usage: PDF-toolbox --overlay --originalfile <file> --overlayfile <file> --destfile <file>
usage: PDF-toolbox --overlay -O <file> -L <file> -D <file>
usage: commands
list of all commands
-o,--overlay Overlay the original PDF with a writingpaper
-s,--ocr OCR the origanal picture or PDF to searchable PDF
usage: parameters
list of all parameters
-D,--destfile <file> The destination PDF
-L,--overlayfile <file> The overlay PDF
-O,--originalfile <file> The original PDF
Make executable Windows¶
TODO
Make executable Mac OS X¶
Use the shell script for starting the program:
cd pdf-toolbox-0.1.3
chmod +x pdf-toolbox
./pdf-toolbox --version
If everything is working at the command to the system PATH.
Install Tesseract¶
Install on MAC:
port search tesseract
port install tesseract
Install on CentOS:
yum install tesseract