I'm trying to define an up to date method for converting any PDF into a PDF/A-1b able to pass 3-Heights validation. I came up with this script which uses ghostscript and qpdf:
#! /bin/bash # transforms input PDF into an optimized PDF/A-1b # usage: $0 input.pdf output.pdf gs -sDEVICE=pdfwrite -dBATCH -dNOPAUSE -dSAFER -sColorConversionStrategy=UseDeviceIndependentColor -dEmbedAllFonts=true -dPrinted=true -dPDFA -sProcessColorModel=DeviceRGB -dPDFACompatibilityPolicy=1 -dDetectDuplicateImages -r150 -sOutputFile=$2 $1 qpdf --linearize $2 $2.optimized mv $2.optimized $2 Which transforms any PDF into a web optimized PDF/A-1b.
Everything is good, except that ghostscript seems not to add missing EOLs before endstreams, which won't allow the processed document to pass the validation. This is the validation result I get:
Validating file "document.pdf" for conformance level pdfa-1b The separator before 'endstream' must be an EOL. (5) The document does not conform to the requested standard. The file format (header, trailer, objects, xref, streams) is corrupted. Done. Do you know any way or tool with which these EOL separators can be added?
Valid and up to date alternative suggestions to convert PDF to PDF/A-1b are welcome too.
qpdf; see also github.com/qpdf/qpdf/issues/38 from 2014. Not sure if this ever got fixed.qpdf! The issue is still not fixed then, as I'm able to reproduce it with the script above. I gave another look at ghostscript, and found out it was already capable of linearizing PDFs. I'll self-answer to show how.