Validate
Examines PDFs at a selectable level of detail and reports errors. Quickly check for invalid or damaged PDFs in your archive, or validate a freshly downloaded bunch of PDFs. Options
java tool.pdf.Validate [options] PDF-file(s)
- validation level (choose one)
- -fast -- Validate a PDF by checking whether its structure is valid. However, the content itself, such as images and content streams, is not read. Since structure and content are intermixed, this is usually sufficient to check for a successful network transmission. This option is useful for checking 1000s or more PDFs quickly.
- -full -- Reads the contents of every object in a PDF. (Default.)
- -obj -- Tests semantic integrity of objects.
- Links (annotations and actions) - check that link destinations (both internal and external) exist. And check that the source boxes of links do not overlap one another.
- actions: GoTo, GoToR, Launch, URI
- -verbose -- Report names of valid files as well as invalid ones.
- -password password -- password if PDF is encrypted
Examples
1. fast check finds files mislabeled with .pdf suffix
java tool.pdf.Validate -fast .
produces /Users/phelps/data/pdfdb/000137.pdf: java.io.IOException: No document catalog. invalid password /Users/phelps/data/pdflr/secure2.pdf File: /Users/phelps/data/pdfdb/000055.pdf ERROR: invalid but repairable (with tool.pdf.Repair) File: /Users/phelps/data/pdfdb/000109.pdf ERROR: can't find '%%EOF' @ byte 3187 File: /Users/phelps/data/pdfdb/000217.pdf ERROR: can't find '%%EOF' @ byte 20705
2. full read of objects, which is the default level of validation
java tool.pdf.Validate jdj
produces File: jdj/4-06.pdf ERROR. object #154: java.io.IOException: incorrect data check @ 10730 #154: {Length=4114, Filter=FlateDecode, DATA=573053} File: jdj/5-06.pdf ERROR. object #25: java.io.IOException: invalid bit length repeat @ 0 #25: {Length=13735, Filter=FlateDecode, DATA=1346607}