PDF to byte array and vice versa

Question

I need to convert pdf to byte array and vice versa.

Can any one help me?

This is how I am converting to byte array

public static byte[] convertDocToByteArray(String sourcePath) { byte[] byteArray=null; try { InputStream inputStream = new FileInputStream(sourcePath); String inputStreamToString = inputStream.toString(); byteArray = inputStreamToString.getBytes(); inputStream.close(); } catch (FileNotFoundException e) { System.out.println("File Not found"+e); } catch (IOException e) { System.out.println("IO Ex"+e); } return byteArray; }

If I use following code to convert it back to document, pdf is getting created. But it's saying 'Bad Format. Not a pdf'.

public static void convertByteArrayToDoc(byte[] b) { OutputStream out; try { out = new FileOutputStream("D:/ABC_XYZ/1.pdf"); out.close(); System.out.println("write success"); }catch (Exception e) { System.out.println(e); }

I have a similar queastion unable to open the pdf stackoverflow.com/questions/77823549/… — dhS
– dhS, Commented Jan 16, 2024 at 5:19

Chris Clark · Accepted Answer · 2016-08-09 21:12:32Z

Java 7 introduced Files.readAllBytes(), which can read a PDF into a byte[] like so:

import java.nio.file.Path; import java.nio.file.Paths; import java.nio.file.Files; Path pdfPath = Paths.get("/path/to/file.pdf"); byte[] pdf = Files.readAllBytes(pdfPath);

EDIT:

Thanks Farooque for pointing out: this will work for reading any kind of file, not just PDFs. All files are ultimately just a bunch of bytes, and as such can be read into a byte[].

Thanks for the import edit @Farooque! What do you mean by "In general it can read a any given file into a byte[]"?
I tested pdf, jpg, gif, png, txt files which works perfectly. Since it supports all type of files, if someone need all types then "In general it can read a any given file into a byte[]" infomation will be helpful

Jon Skeet · Accepted Answer · 2012-02-27 07:31:49Z

You basically need a helper method to read a stream into memory. This works pretty well:

public static byte[] readFully(InputStream stream) throws IOException { byte[] buffer = new byte[8192]; ByteArrayOutputStream baos = new ByteArrayOutputStream(); int bytesRead; while ((bytesRead = stream.read(buffer)) != -1) { baos.write(buffer, 0, bytesRead); } return baos.toByteArray(); }

Then you'd call it with:

public static byte[] loadFile(String sourcePath) throws IOException { InputStream inputStream = null; try { inputStream = new FileInputStream(sourcePath); return readFully(inputStream); } finally { if (inputStream != null) { inputStream.close(); } } }

Don't mix up text and binary data - it only leads to tears.

I guess there needs to be an extra bracket in readFully while statement .. like while ((bytesRead = stream.read(buffer)) != -1)
@JonSkeet - the size you initialise to is 8192 - how large of a PDF file would this work with? I know this is like asking "How long is a piece of String", but maybe a generic guideline if you know? My PDFs will be up to 20 pages long at a guess.
@notyou: That's just a buffer size that isn't enormous, but is large enough to avoid "system call for each byte". It's a reasonable default, basically.
please see .. unable to open the pdf stackoverflow.com/questions/77823549/…

YvesR · Accepted Answer · 2014-06-25 08:29:32Z

You can do it by using Apache Commons IO without worrying about internal details.

Use org.apache.commons.io.FileUtils.readFileToByteArray(File file) which return data of type byte[].

Click here for Javadoc

Aseem Savio · Accepted Answer · 2019-10-23 19:39:58Z

This worked for me. I haven't used any third-party libraries. Just the ones that are shipped with Java.

import java.io.*; import java.nio.file.Files; import java.nio.file.Path; import java.nio.file.Paths; public class PDFUtility { public static void main(String[] args) throws IOException { /** * Converts byte stream into PDF. */ PDFUtility pdfUtility = new PDFUtility(); byte[] byteStreamPDF = pdfUtility.convertPDFtoByteStream(); FileOutputStream fileOutputStream = new FileOutputStream("C:\\Users\\aseem\\Desktop\\BlaFolder\\BlaFolder2\\aseempdf.pdf"); fileOutputStream.write(byteStreamPDF); fileOutputStream.close(); System.out.println("File written successfully"); } /** * Creates PDF to Byte Stream * * @return * @throws IOException */ protected byte[] convertPDFtoByteStream() throws IOException { Path path = Paths.get("C:\\Users\\aseem\\aaa.pdf"); return Files.readAllBytes(path); } }

Bacteria · Accepted Answer · 2015-08-16 19:17:42Z

public static void main(String[] args) throws FileNotFoundException, IOException { File file = new File("java.pdf"); FileInputStream fis = new FileInputStream(file); //System.out.println(file.exists() + "!!"); //InputStream in = resource.openStream(); ByteArrayOutputStream bos = new ByteArrayOutputStream(); byte[] buf = new byte[1024]; try { for (int readNum; (readNum = fis.read(buf)) != -1;) { bos.write(buf, 0, readNum); //no doubt here is 0 //Writes len bytes from the specified byte array starting at offset off to this byte array output stream. System.out.println("read " + readNum + " bytes,"); } } catch (IOException ex) { Logger.getLogger(genJpeg.class.getName()).log(Level.SEVERE, null, ex); } byte[] bytes = bos.toByteArray(); //below is the different part File someFile = new File("java2.pdf"); FileOutputStream fos = new FileOutputStream(someFile); fos.write(bytes); fos.flush(); fos.close(); }

please check .. unable to open the pdf stackoverflow.com/questions/77823549/…

David · Accepted Answer · 2009-07-15 12:45:37Z

Are'nt you creating the pdf file but not actually writing the byte array back? Therefore you cannot open the PDF.

out = new FileOutputStream("D:/ABC_XYZ/1.pdf"); out.Write(b, 0, b.Length); out.Position = 0; out.Close();

This is in addition to correctly reading in the PDF to byte array.

this may not have been useful as you are saving it to file but I ran into issues where I was putting the byte array into a MemoryStream object and downloading it to the client. I had to set the Position back to 0 for this to work.

Sufian · Accepted Answer · 2014-06-25 08:29:25Z

Calling toString() on an InputStream doesn't do what you think it does. Even if it did, a PDF contains binary data, so you wouldn't want to convert it to a string first.

What you need to do is read from the stream, write the results into a ByteArrayOutputStream, then convert the ByteArrayOutputStream into an actual byte array by calling toByteArray():

InputStream inputStream = new FileInputStream(sourcePath); ByteArrayOutputStream outputStream = new ByteArrayOutputStream(); int data; while( (data = inputStream.read()) >= 0 ) { outputStream.write(data); } inputStream.close(); return outputStream.toByteArray();

Reading a single byte at a time isn't terribly efficient. Better to copy a block at a time.
@Jon - true, but I was trying to keep ti simple. Also, doesn't FileInputStream do buffering internally anyways that would mitigate that?

Riddhi Gohil · Accepted Answer · 2016-04-26 09:17:17Z

To convert pdf to byteArray :

public byte[] pdfToByte(String filePath)throws JRException { File file = new File(<filePath>); FileInputStream fileInputStream; byte[] data = null; byte[] finalData = null; ByteArrayOutputStream byteArrayOutputStream = null; try { fileInputStream = new FileInputStream(file); data = new byte[(int)file.length()]; finalData = new byte[(int)file.length()]; byteArrayOutputStream = new ByteArrayOutputStream(); fileInputStream.read(data); byteArrayOutputStream.write(data); finalData = byteArrayOutputStream.toByteArray(); fileInputStream.close(); } catch (FileNotFoundException e) { LOGGER.info("File not found" + e); } catch (IOException e) { LOGGER.info("IO exception" + e); } return finalData; }

Sridhar · Accepted Answer · 2012-04-29 17:59:43Z

This works for me:

try(InputStream pdfin = new FileInputStream("input.pdf");OutputStream pdfout = new FileOutputStream("output.pdf")){ byte[] buffer = new byte[1024]; int bytesRead; while((bytesRead = pdfin.read(buffer))!=-1){ pdfout.write(buffer,0,bytesRead); } }

But Jon's answer doesn't work for me if used in the following way:

try(InputStream pdfin = new FileInputStream("input.pdf");OutputStream pdfout = new FileOutputStream("output.pdf")){ int k = readFully(pdfin).length; System.out.println(k); }

Outputs zero as length. Why is that ?

Sufian · Accepted Answer · 2014-06-25 09:45:15Z

None of these worked for us, possibly because our inputstream was bytes from a rest call, and not from a locally hosted pdf file. What worked was using RestAssured to read the PDF as an input stream, and then using Tika pdf reader to parse it and then call the toString() method.

import com.jayway.restassured.RestAssured; import com.jayway.restassured.response.Response; import com.jayway.restassured.response.ResponseBody; import org.apache.tika.exception.TikaException; import org.apache.tika.metadata.Metadata; import org.apache.tika.parser.AutoDetectParser; import org.apache.tika.parser.ParseContext; import org.apache.tika.sax.BodyContentHandler; import org.apache.tika.parser.Parser; import org.xml.sax.ContentHandler; import org.xml.sax.SAXException; InputStream stream = response.asInputStream(); Parser parser = new AutoDetectParser(); // Should auto-detect! ContentHandler handler = new BodyContentHandler(); Metadata metadata = new Metadata(); ParseContext context = new ParseContext(); try { parser.parse(stream, handler, metadata, context); } finally { stream.close(); } for (int i = 0; i < metadata.names().length; i++) { String item = metadata.names()[i]; System.out.println(item + " -- " + metadata.get(item)); } System.out.println("!!Printing pdf content: \n" +handler.toString()); System.out.println("content type: " + metadata.get(Metadata.CONTENT_TYPE));

Hi i have the similar question ...unable to open the pdf . stackoverflow.com/questions/77823549/…

Akash Roy · Accepted Answer · 2018-05-31 07:51:39Z

I have implemented similiar behaviour in my Application too without fail. Below is my version of code and it is functional.

 byte[] getFileInBytes(String filename) { File file = new File(filename); int length = (int)file.length(); byte[] bytes = new byte[length]; try { BufferedInputStream reader = new BufferedInputStream(new FileInputStream(file)); reader.read(bytes, 0, length); System.out.println(reader); // setFile(bytes); } catch (FileNotFoundException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } return bytes; }

hi i have the samilar question stackoverflow.com/questions/77823549/…

Jennifer CevallosY · Accepted Answer · 2021-11-09 23:15:19Z

public String encodeFileToBase64Binary(String fileName) throws IOException { System.out.println("encodeFileToBase64Binary: "+ fileName); File file = new File(fileName); byte[] bytes = loadFile(file); byte[] encoded = Base64.encodeBase64(bytes); String encodedString = new String(encoded); System.out.println("ARCHIVO B64: "+encodedString); return encodedString; } @SuppressWarnings("resource") public static byte[] loadFile(File file) throws IOException { InputStream is = new FileInputStream(file); long length = file.length(); if (length > Integer.MAX_VALUE) { // File is too large } byte[] bytes = new byte[(int)length]; int offset = 0; int numRead = 0; while (offset < bytes.length && (numRead=is.read(bytes, offset, bytes.length-offset)) >= 0) { offset += numRead; } if (offset < bytes.length) { throw new IOException("Could not completely read file "+file.getName()); } is.close(); return bytes; }

I don't think the questioner needs a base64 conversion. He use toString just because he doesn't know how to read the file to the bytes.
i have the same question unable to open the pdf in it stackoverflow.com/questions/77823549/…

plinth · Accepted Answer · 2009-07-15 12:35:49Z

-2

PDFs may contain binary data and chances are it's getting mangled when you do ToString. It seems to me that you want this:

 FileInputStream inputStream = new FileInputStream(sourcePath); int numberBytes = inputStream .available(); byte bytearray[] = new byte[numberBytes]; inputStream .read(bytearray);

answered Jul 15, 2009 at 12:35

plinth

49.5k11 gold badges84 silver badges123 bronze badges

2 Comments

Jon Skeet Over a year ago

That's a horrible way of reading data - please don't assume that available() will contain all of the data in a stream.

Eric Petroelje Over a year ago

@Jon - seconded. available() will (usually) return the number of bytes that can be read immediately without blocking. It has little to do with how much data is actually in the file..

Collectives™ on Stack Overflow

PDF to byte array and vice versa

13 Answers 13

3 Comments

4 Comments

Comments

1 Comment

1 Comment

2 Comments

2 Comments

Comments

Comments

1 Comment

1 Comment

2 Comments

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

13 Answers 13

3 Comments

4 Comments

Comments

1 Comment

1 Comment

2 Comments

2 Comments

Comments

Comments

1 Comment

1 Comment

2 Comments

2 Comments

Linked

Related