Fetch Attached (Embedded) Pdf from Excel in either PHPExcel or PHPSpreadsheet

Question

I have an excel file which contains PDF - embedded (attached) in it.

I am trying to use PHPExcel and PHPSpreadsheet to fetch the data. I am successful in fetching the images but other objects like PDF are not accessible

My first try is using PHP but I am also fine if its possible with Python

K J · Accepted Answer · 2023-10-16 00:27:15Z

XLSX is a Zip container of Excel components so we can open the zip file and manipulate the contents.

Our Objects of interest are in the "embeddings" folder and if there is only one embedding it is easy to extract as oleObject1.bin so one line to extract and one line to start editor or your customised python find and save.

In that BIN file we can file seek the address of the PDF header %PDF- here at 00002240

Also file seek its EOF @ 00004794 %%EOF\x0A

Now using any method such as Heads and Tails, splice out that PDF in this case 2554 bytes and save as BINary.pdf

I wrote a script to extract a PDF from an office bin file on Windows OS so after un TAR, Windows users can run this script. NOTE it has 2 small .exe dependencies you need to download and specify a path so see and edit start of file. For PHP you should be able to emulate that in Python so for starters see https://stackoverflow.com/a/56742848/10802527

@echo off REM dependencies are REM Didier Stevens middle.exe from https://blog.didierstevens.com/programs/binary-tools/ REM Mark Russinovich strings.exe from https://learn.microsoft.com/en-us/sysinternals/downloads/strings REM both above to be placed on path or folder e.g. set "utils=C:\Downloads\Apps\utils" setlocal enableDelayedExpansion if not exist "%~dpn1.bin" echo %0 requires a bin file to work on & pause & exit /b "%utils%\strings.exe" -o "%~1"|Findstr "%PDF-">AcroHEAD.txt set /p HEAD=<AcroHEAD.txt if [%HEAD%]==[] echo %PDF- Header not found & del Acro????.txt & pause & exit /b echo !HEAD! >AcroHEAD.txt for /f "tokens=1 delims=:" %%f in (AcroHEAD.txt) do set START=%%f "%utils%\strings.exe" -o "%~1"|Findstr "%%EOF">AcroTAIL.txt for /f "tokens=1 delims=:" %%f in (AcroTAIL.txt) do set TAIL=%%f set /a LEN=%TAIL%+6-%START% del Acro????.txt "%utils%\middle.exe" "%~1" %START% %LEN% "%~dpn1.pdf"

Collectives™ on Stack Overflow

Fetch Attached (Embedded) Pdf from Excel in either PHPExcel or PHPSpreadsheet

1 Answer 1

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Linked

Related