0

Hopefully this is simple.

I am using pdfbox to extract images from a pdf. I want to write the images to a folder. I don't seem to get any output (the folder has read and write privileges).

I am probably not writing the output stream properly I think.

import java.io.File; import java.io.FileOutputStream; import java.io.IOException; import java.util.Iterator; import java.util.List; import java.util.Map; import org.apache.pdfbox.pdmodel.PDDocument; import org.apache.pdfbox.pdmodel.PDPage; import org.apache.pdfbox.pdmodel.PDResources; import org.apache.pdfbox.pdmodel.graphics.xobject.PDXObjectImage; public final class JavaImgExtactor { public static void main(String[] args) throws IOException{ Stuff(); } @SuppressWarnings("resource") public static void Stuff() throws IOException{ File inFile = new File("/Users/sebastianzeki/Documents/Images Captured with Proc Data Audit.pdf"); PDDocument document = new PDDocument(); //document=null; try { document = PDDocument.load(inFile); } catch (Exception e1) { // TODO Auto-generated catch block e1.printStackTrace(); } List pages = document.getDocumentCatalog().getAllPages(); Iterator iter = pages.iterator(); while (iter.hasNext()) { PDPage page = (PDPage) iter.next(); System.out.println("page"+page); PDResources resources = page.getResources(); Map pageImages = resources.getImages(); if (pageImages != null) { Iterator imageIter = pageImages.keySet().iterator(); System.out.println("Success"+imageIter); while (imageIter.hasNext()) { String key = (String) imageIter.next(); PDXObjectImage image = (PDXObjectImage) pageImages.get(key); FileOutputStream out = new FileOutputStream("/Users/sebastianzeki/Documents/ImgPDF.jpg"); try { image.write2OutputStream(out); } catch (Exception e) { // TODO Auto-generated catch block e.printStackTrace(); } } } } } } 

2 Answers 2

1

You are not closing the output stream, and the file name is always the same.

try (FileOutputStream out = new FileOutputStream("/Users/sebastianzeki/Documents/ImgPDF" + key + ".jpg") { write2OutputStream(out); } (Exception e) { printStackTrace(); } 

try-with-resources will automatically close out. Not sure whether key is usable as file name part.

Sign up to request clarification or add additional context in comments.

3 Comments

I think its more fundamental than that as I don't get any System.out.println result after while (imageIter.hasNext()). Am I not adding the images to a collection?
@SebastianZeki Maybe the images are not on that level, or are inline images - you should share the PDF. Better: use the current version 2.0.6, and there use the ExtractImages.java source code from the source code download.
I recommend to use the ExtractImages from 2.0.6 and not the one from 1.8.* because the newer one catches more images.
0

image.write2OutputStream(out); writes the bytes from the image object to the out FileOutputStream object but it doesn't flush the buffer of out .

Add it should do the job :

out.flush(); 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.