0

Normally, all images are written as PBM (for monochrome images), PGM (for grayscale images), or PPM (for color images) files. With this option, images in DCT format are saved as JPEG files. All non-DCT images are saved in PBM/PGM/PPM format as usual. (Inline images are always saved in PBM/PGM/PPM format.)

man pdfimages from Xpdf

The default output format is PBM (for monochrome images) or PPM for non-monochrome. The -png or -tiff options change to default output to PNG or TIFF respectively. If both -png and -tiff are specified, CMYK images will be written as TIFF and all other images will be written as PNG. In addition the -j, -jp2, and -jbig2 options will cause JPEG, JPEG2000, and JBIG2, respectively, images in the PDF file to be written in their native format.

man pdfimages from poppler

What is the reason that both implementations of pdfimages extract images in "mysterious" (I call them "mysterious" because I have never heard of them before) PBM/PGM/PPM formats (collectively known as Netpbm or PNM, https://en.wikipedia.org/wiki/Netpbm) instead of PNG, JPEG or maybe GIF, which are (I might be wrong, of course) is the de-facto standard in casual-user-world these days (and, if I recall correctly, it was the same de-facto standard 10 and 20 years ago as well)?

2 Answers 2

2

The netpbm tools are decades old (from 1988). The formats are not typically target state (because they're not efficient in size) so you normally won't see these files instead of GIF/JPG/PNG.

The idea, instead, is to have a neutral lossless format that can be used as an intermediary in file type conversions.

So instead of writing a PNG->JPEG and a JPEG->PNG converter you would write a converter for PNG to/from the neutral format and for JPEG to/from the format. So far this sounds bad; 4 programs instead of two.

But now we add GIF; all we need is GIF to/from the format and we now automatically get GIF<->PNG and GIF<->JPEG; 4 conversions for the cost off 2 programs. Then we add BMP to/from the format and we get BMP<->GIF, BMP<->PNG, BMP<->JPEG; 6 conversions for the cost of 2 programs. Let's add in PDF and we get 8 conversions for 2 programs.

We can see the more formats we can convert to/from the neutral format the massively more conversions we get as a result, and all these 2 programs only need to know their own special format (eg JPG) and the neutral intermediate one, which is a lot easier for authors to deal with!

0
1

The gif and jpeg formats actually predate netpbm, but are very complicated formats that require specialized libraries to successfully read and write them. At the time netpbm was created, there was not even a formal specification for the gif format outside of existing code, so gifs could not be read without a gif library or a lot of trial and error to find special cases.

The the pbm/pgm/ppm formats are basically raw data with a nice header, so it is trivial to write code to read and write them even without using any complicated libraries. You could even use these to create images with shell scripts generating the raw data.

In addition to the advantages and flexibility created by using a neutral format (as well described in the other answer), these formats were created so that simple code could read and write images without needing to bother with complicated format specific libraries, and do this in the most portable way possible.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.