2

I have an excel file which contains PDF - embedded (attached) in it.

I am trying to use PHPExcel and PHPSpreadsheet to fetch the data. I am successful in fetching the images but other objects like PDF are not accessible

My first try is using PHP but I am also fine if its possible with Python

0

1 Answer 1

1

XLSX is a Zip container of Excel components so we can open the zip file and manipulate the contents.

enter image description here

Our Objects of interest are in the "embeddings" folder and if there is only one embedding it is easy to extract as oleObject1.bin so one line to extract and one line to start editor or your customised python find and save.

enter image description here

In that BIN file we can file seek the address of the PDF header %PDF- here at 00002240 enter image description here

Also file seek its EOF @ 00004794 %%EOF\x0A

enter image description here

Now using any method such as Heads and Tails, splice out that PDF in this case 2554 bytes and save as BINary.pdf

enter image description here

enter image description here

I wrote a script to extract a PDF from an office bin file on Windows OS so after un TAR, Windows users can run this script. NOTE it has 2 small .exe dependencies you need to download and specify a path so see and edit start of file. For PHP you should be able to emulate that in Python so for starters see https://stackoverflow.com/a/56742848/10802527

@echo off REM dependencies are REM Didier Stevens middle.exe from https://blog.didierstevens.com/programs/binary-tools/ REM Mark Russinovich strings.exe from https://learn.microsoft.com/en-us/sysinternals/downloads/strings REM both above to be placed on path or folder e.g. set "utils=C:\Downloads\Apps\utils" setlocal enableDelayedExpansion if not exist "%~dpn1.bin" echo %0 requires a bin file to work on & pause & exit /b "%utils%\strings.exe" -o "%~1"|Findstr "%PDF-">AcroHEAD.txt set /p HEAD=<AcroHEAD.txt if [%HEAD%]==[] echo %PDF- Header not found & del Acro????.txt & pause & exit /b echo !HEAD! >AcroHEAD.txt for /f "tokens=1 delims=:" %%f in (AcroHEAD.txt) do set START=%%f "%utils%\strings.exe" -o "%~1"|Findstr "%%EOF">AcroTAIL.txt for /f "tokens=1 delims=:" %%f in (AcroTAIL.txt) do set TAIL=%%f set /a LEN=%TAIL%+6-%START% del Acro????.txt "%utils%\middle.exe" "%~1" %START% %LEN% "%~dpn1.pdf" 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.