- Notifications
You must be signed in to change notification settings - Fork 671
Wrapping FileOptimizer
FileOptimizer is a GNU LGPLv3 licensed tool for Windows platforms that reduces file sizes for many dozens of file types - among them PDF. It actually is a frontend to a plethora of other, highly specialized compression tools as plugins to achieve this.
If you want to squeeze your PDFs as much as possible, consider trying FileOptimizer.
For PDFs, FileOptimizer uses plugins for Ghostscript and smpdf. Compression results can be quite impressive: I often get 30% to 50%, but I have seen 90%, too.
Here is the issue: smpdf is free software for personal use only. When you use FileOptimizer for your PDF, you will find afterwards, that both metadata fields /Producer and /Creator have been overwritten with the text Coherent Lossless PDF Compressor. Not for commercial use. http://www.coherentpdf.com.
Annoying.
The following script (a wrapper-wrapper) restores a PDF's original metadata after optimization.
from __future__ import print_function import fitz import sys, os, subprocess, tempfile, time ''' Optimizes a PDF with FileOptimizer. But as "/Producer" and "/Creator" get spoiled by this, we first save the metadata and restore it afterwards. We accept the cost of non-compressed object definitions (as created by FileOptimizer). ''' assert len(sys.argv) == 2, "need filename parameter" fn = sys.argv[1] assert fn.lower().endswith(".pdf"), "must be a PDF file" fullname = os.path.abspath(fn) # get the full path & name t0 = time.clock() # save current time doc = fitz.open(fullname) # open PDF to save metadata meta = doc.metadata doc.close() t1 = time.clock() # save current time again subprocess.call(["fileoptimizer64", fullname]) # now invoke optimizer t2 = time.clock() # save current time again cdir = os.path.split(fullname)[0] # split dir from filename fnout = tempfile.mkstemp(suffix = ".pdf", dir = cdir) # create temp pdf name doc = fitz.open(fullname) # open optimized PDF doc.setMetadata(meta) # restore old metadata doc.save(fnout[1], garbage = 4) # save temp PDF with it doc.close() # close it os.remove(fn) # remove super optimized file os.close(fnout[0]) # close temp file os.rename(fnout[1], fn) # and rename it to original filename t3 = time.clock() # save current time again # put out runtime statistics print("Timings:") print(str(round(t1-t0, 4)).rjust(10), "save old metata") print(str(round(t2-t1, 4)).rjust(10), "execute FileOptimizer") print(str(round(t3-t2, 4)).rjust(10), "restore old metadata") - Beware however, that this treatment does not change restriction to non-commercial use.
- FileOptimizer has reported to run with WINE under other platforms than Windows.
HOWTO Button annots with JavaScript
HOWTO work with PDF embedded files
HOWTO extract text from inside rectangles
HOWTO extract text in natural reading order
HOWTO create or extract graphics
HOWTO create your own PDF Drawing
Rectangle inclusion & intersection
Metadata & bookmark maintenance