41

I need to convert pdf to byte array and vice versa.

Can any one help me?

This is how I am converting to byte array

public static byte[] convertDocToByteArray(String sourcePath) { byte[] byteArray=null; try { InputStream inputStream = new FileInputStream(sourcePath); String inputStreamToString = inputStream.toString(); byteArray = inputStreamToString.getBytes(); inputStream.close(); } catch (FileNotFoundException e) { System.out.println("File Not found"+e); } catch (IOException e) { System.out.println("IO Ex"+e); } return byteArray; } 

If I use following code to convert it back to document, pdf is getting created. But it's saying 'Bad Format. Not a pdf'.

public static void convertByteArrayToDoc(byte[] b) { OutputStream out; try { out = new FileOutputStream("D:/ABC_XYZ/1.pdf"); out.close(); System.out.println("write success"); }catch (Exception e) { System.out.println(e); } 
1

13 Answers 13

47

Java 7 introduced Files.readAllBytes(), which can read a PDF into a byte[] like so:

import java.nio.file.Path; import java.nio.file.Paths; import java.nio.file.Files; Path pdfPath = Paths.get("/path/to/file.pdf"); byte[] pdf = Files.readAllBytes(pdfPath); 

EDIT:

Thanks Farooque for pointing out: this will work for reading any kind of file, not just PDFs. All files are ultimately just a bunch of bytes, and as such can be read into a byte[].

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for the import edit @Farooque! What do you mean by "In general it can read a any given file into a byte[]"?
I tested pdf, jpg, gif, png, txt files which works perfectly. Since it supports all type of files, if someone need all types then "In general it can read a any given file into a byte[]" infomation will be helpful
34

You basically need a helper method to read a stream into memory. This works pretty well:

public static byte[] readFully(InputStream stream) throws IOException { byte[] buffer = new byte[8192]; ByteArrayOutputStream baos = new ByteArrayOutputStream(); int bytesRead; while ((bytesRead = stream.read(buffer)) != -1) { baos.write(buffer, 0, bytesRead); } return baos.toByteArray(); } 

Then you'd call it with:

public static byte[] loadFile(String sourcePath) throws IOException { InputStream inputStream = null; try { inputStream = new FileInputStream(sourcePath); return readFully(inputStream); } finally { if (inputStream != null) { inputStream.close(); } } } 

Don't mix up text and binary data - it only leads to tears.

4 Comments

I guess there needs to be an extra bracket in readFully while statement .. like while ((bytesRead = stream.read(buffer)) != -1)
@JonSkeet - the size you initialise to is 8192 - how large of a PDF file would this work with? I know this is like asking "How long is a piece of String", but maybe a generic guideline if you know? My PDFs will be up to 20 pages long at a guess.
@notyou: That's just a buffer size that isn't enormous, but is large enough to avoid "system call for each byte". It's a reasonable default, basically.
please see .. unable to open the pdf stackoverflow.com/questions/77823549/…
6

You can do it by using Apache Commons IO without worrying about internal details.

Use org.apache.commons.io.FileUtils.readFileToByteArray(File file) which return data of type byte[].

Click here for Javadoc

Comments

4

This worked for me. I haven't used any third-party libraries. Just the ones that are shipped with Java.

import java.io.*; import java.nio.file.Files; import java.nio.file.Path; import java.nio.file.Paths; public class PDFUtility { public static void main(String[] args) throws IOException { /** * Converts byte stream into PDF. */ PDFUtility pdfUtility = new PDFUtility(); byte[] byteStreamPDF = pdfUtility.convertPDFtoByteStream(); FileOutputStream fileOutputStream = new FileOutputStream("C:\\Users\\aseem\\Desktop\\BlaFolder\\BlaFolder2\\aseempdf.pdf"); fileOutputStream.write(byteStreamPDF); fileOutputStream.close(); System.out.println("File written successfully"); } /** * Creates PDF to Byte Stream * * @return * @throws IOException */ protected byte[] convertPDFtoByteStream() throws IOException { Path path = Paths.get("C:\\Users\\aseem\\aaa.pdf"); return Files.readAllBytes(path); } } 

1 Comment

2
public static void main(String[] args) throws FileNotFoundException, IOException { File file = new File("java.pdf"); FileInputStream fis = new FileInputStream(file); //System.out.println(file.exists() + "!!"); //InputStream in = resource.openStream(); ByteArrayOutputStream bos = new ByteArrayOutputStream(); byte[] buf = new byte[1024]; try { for (int readNum; (readNum = fis.read(buf)) != -1;) { bos.write(buf, 0, readNum); //no doubt here is 0 //Writes len bytes from the specified byte array starting at offset off to this byte array output stream. System.out.println("read " + readNum + " bytes,"); } } catch (IOException ex) { Logger.getLogger(genJpeg.class.getName()).log(Level.SEVERE, null, ex); } byte[] bytes = bos.toByteArray(); //below is the different part File someFile = new File("java2.pdf"); FileOutputStream fos = new FileOutputStream(someFile); fos.write(bytes); fos.flush(); fos.close(); } 

1 Comment

please check .. unable to open the pdf stackoverflow.com/questions/77823549/…
1

Are'nt you creating the pdf file but not actually writing the byte array back? Therefore you cannot open the PDF.

out = new FileOutputStream("D:/ABC_XYZ/1.pdf"); out.Write(b, 0, b.Length); out.Position = 0; out.Close(); 

This is in addition to correctly reading in the PDF to byte array.

2 Comments

out.position=0 ?? I dint get it
this may not have been useful as you are saving it to file but I ran into issues where I was putting the byte array into a MemoryStream object and downloading it to the client. I had to set the Position back to 0 for this to work.
1

Calling toString() on an InputStream doesn't do what you think it does. Even if it did, a PDF contains binary data, so you wouldn't want to convert it to a string first.

What you need to do is read from the stream, write the results into a ByteArrayOutputStream, then convert the ByteArrayOutputStream into an actual byte array by calling toByteArray():

InputStream inputStream = new FileInputStream(sourcePath); ByteArrayOutputStream outputStream = new ByteArrayOutputStream(); int data; while( (data = inputStream.read()) >= 0 ) { outputStream.write(data); } inputStream.close(); return outputStream.toByteArray(); 

2 Comments

Reading a single byte at a time isn't terribly efficient. Better to copy a block at a time.
@Jon - true, but I was trying to keep ti simple. Also, doesn't FileInputStream do buffering internally anyways that would mitigate that?
1

To convert pdf to byteArray :

public byte[] pdfToByte(String filePath)throws JRException { File file = new File(<filePath>); FileInputStream fileInputStream; byte[] data = null; byte[] finalData = null; ByteArrayOutputStream byteArrayOutputStream = null; try { fileInputStream = new FileInputStream(file); data = new byte[(int)file.length()]; finalData = new byte[(int)file.length()]; byteArrayOutputStream = new ByteArrayOutputStream(); fileInputStream.read(data); byteArrayOutputStream.write(data); finalData = byteArrayOutputStream.toByteArray(); fileInputStream.close(); } catch (FileNotFoundException e) { LOGGER.info("File not found" + e); } catch (IOException e) { LOGGER.info("IO exception" + e); } return finalData; } 

Comments

0

This works for me:

try(InputStream pdfin = new FileInputStream("input.pdf");OutputStream pdfout = new FileOutputStream("output.pdf")){ byte[] buffer = new byte[1024]; int bytesRead; while((bytesRead = pdfin.read(buffer))!=-1){ pdfout.write(buffer,0,bytesRead); } } 

But Jon's answer doesn't work for me if used in the following way:

try(InputStream pdfin = new FileInputStream("input.pdf");OutputStream pdfout = new FileOutputStream("output.pdf")){ int k = readFully(pdfin).length; System.out.println(k); } 

Outputs zero as length. Why is that ?

Comments

0

None of these worked for us, possibly because our inputstream was bytes from a rest call, and not from a locally hosted pdf file. What worked was using RestAssured to read the PDF as an input stream, and then using Tika pdf reader to parse it and then call the toString() method.

import com.jayway.restassured.RestAssured; import com.jayway.restassured.response.Response; import com.jayway.restassured.response.ResponseBody; import org.apache.tika.exception.TikaException; import org.apache.tika.metadata.Metadata; import org.apache.tika.parser.AutoDetectParser; import org.apache.tika.parser.ParseContext; import org.apache.tika.sax.BodyContentHandler; import org.apache.tika.parser.Parser; import org.xml.sax.ContentHandler; import org.xml.sax.SAXException; InputStream stream = response.asInputStream(); Parser parser = new AutoDetectParser(); // Should auto-detect! ContentHandler handler = new BodyContentHandler(); Metadata metadata = new Metadata(); ParseContext context = new ParseContext(); try { parser.parse(stream, handler, metadata, context); } finally { stream.close(); } for (int i = 0; i < metadata.names().length; i++) { String item = metadata.names()[i]; System.out.println(item + " -- " + metadata.get(item)); } System.out.println("!!Printing pdf content: \n" +handler.toString()); System.out.println("content type: " + metadata.get(Metadata.CONTENT_TYPE)); 

1 Comment

Hi i have the similar question ...unable to open the pdf . stackoverflow.com/questions/77823549/…
0

I have implemented similiar behaviour in my Application too without fail. Below is my version of code and it is functional.

 byte[] getFileInBytes(String filename) { File file = new File(filename); int length = (int)file.length(); byte[] bytes = new byte[length]; try { BufferedInputStream reader = new BufferedInputStream(new FileInputStream(file)); reader.read(bytes, 0, length); System.out.println(reader); // setFile(bytes); } catch (FileNotFoundException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } return bytes; } 

1 Comment

hi i have the samilar question stackoverflow.com/questions/77823549/…
0
public String encodeFileToBase64Binary(String fileName) throws IOException { System.out.println("encodeFileToBase64Binary: "+ fileName); File file = new File(fileName); byte[] bytes = loadFile(file); byte[] encoded = Base64.encodeBase64(bytes); String encodedString = new String(encoded); System.out.println("ARCHIVO B64: "+encodedString); return encodedString; } @SuppressWarnings("resource") public static byte[] loadFile(File file) throws IOException { InputStream is = new FileInputStream(file); long length = file.length(); if (length > Integer.MAX_VALUE) { // File is too large } byte[] bytes = new byte[(int)length]; int offset = 0; int numRead = 0; while (offset < bytes.length && (numRead=is.read(bytes, offset, bytes.length-offset)) >= 0) { offset += numRead; } if (offset < bytes.length) { throw new IOException("Could not completely read file "+file.getName()); } is.close(); return bytes; } 

2 Comments

I don't think the questioner needs a base64 conversion. He use toString just because he doesn't know how to read the file to the bytes.
i have the same question unable to open the pdf in it stackoverflow.com/questions/77823549/…
-2

PDFs may contain binary data and chances are it's getting mangled when you do ToString. It seems to me that you want this:

 FileInputStream inputStream = new FileInputStream(sourcePath); int numberBytes = inputStream .available(); byte bytearray[] = new byte[numberBytes]; inputStream .read(bytearray); 

2 Comments

That's a horrible way of reading data - please don't assume that available() will contain all of the data in a stream.
@Jon - seconded. available() will (usually) return the number of bytes that can be read immediately without blocking. It has little to do with how much data is actually in the file..