Installation

PreRequisite

PDF-toolbox is a JAVA program and developed with JAVA version 8. Only a runtime is needed. When creating a searchable PDF file the source is processed with the tesseract OCR engine. This is optional.

Download & Install

First download the code. The code is found on the VirtOrg website http://www.virtorg.org. On the mainpage there is a reference of the latest version of the program. Click on the link and a ZIP file is downloaded. After downloading unpack the zipfile.

download and install:

wget http://www.virtorg.org/files/PDF-toolbox/vtgPDF-toolbox-0.1.?-bin.zip
mkdir PDF-toolbox
cd PDF-toolbox
unzip ../vtgPDF-toolbox-0.1.?-bin.zip
cd pdf-toolbox-0.1.?
java -jar target/vtgPDF-toolbox-0.1.?.jar --version

If the version is presented then the code is working. It is posible that you see some loggin messages:

log4j:WARN No appenders could be found for logger (com.virtorg.pdf.ocr.ServiceOCR).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

These messages can be ignored for now.

Edit the pdf-toolbox script file and change the parameters which are needed for the use of tesseract. (options=”-Djna.library.path=/opt/local/lib -Dvtg.tessdata.path=/opt/local/share”)

pdf-toolbox:

#!/usr/bin/env bash
jarfile=/target/vtgPDF-toolbox-0.1.4-SNAPSHOT.jar
options="-Djna.library.path=/opt/local/lib -Dvtg.tessdata.path=/opt/local/share"
if [[ $0 =~ ^/ ]] ; then
   # absolute path used
   program=$(dirname $0)$jarfile;
else
   # relative path used
   program=`pwd`/$(dirname $0)$jarfile;
fi
echo java $options -jar $program $*
java $options -jar $program $*

Usage

usage:

usage: PDF-toolbox
list of all options and commands
 -c,--createLogFile         create a new log4j.properties
 -D,--destfile <file>       The destination PDF
 -h,--help                  print this message
 -L,--overlayfile <file>    The overlay PDF
 -o,--overlay               Overlay the original PDF with a writingpaper
 -O,--originalfile <file>   The original PDF
 -r,--replace               replace the original file with the resultfile
 -s,--ocr                   OCR the origanal picture or PDF to searchable PDF
 -v,--version               print program version
 -V,--verbose               be extra verbose
Have a lot of fun with this VirtOrg program.

usage: PDF-toolbox [[options]] command [[parameters]]
usage: PDF-toolbox --overlay --originalfile <file> --overlayfile <file> --destfile <file>
usage: PDF-toolbox --overlay -O <file> -L <file> -D <file>
usage: commands
list of all commands
 -o,--overlay   Overlay the original PDF with a writingpaper
 -s,--ocr       OCR the origanal picture or PDF to searchable PDF
usage: parameters
list of all parameters
 -D,--destfile <file>       The destination PDF
 -L,--overlayfile <file>    The overlay PDF
 -O,--originalfile <file>   The original PDF

Make executable Windows

TODO

Make executable Mac OS X

Use the shell script for starting the program:

cd pdf-toolbox-0.1.3
chmod +x pdf-toolbox
./pdf-toolbox --version

If everything is working at the command to the system PATH.

Install Tesseract

Install on MAC:

port search tesseract
port install tesseract

Install on CentOS:

yum install tesseract