XLSX is a Zip container of Excel components so we can open the zip file and manipulate the contents.

Our Objects of interest are in the "embeddings" folder and if there is only one embedding it is easy to extract as oleObject1.bin so one line to extract and one line to start editor or your customised python find and save.

In that BIN file we can file seek the address of the PDF header %PDF- here at 00002240 
Also file seek its EOF @ 00004794 %%EOF\x0A

Now using any method such as Heads and Tails, splice out that PDF in this case 2554 bytes and save as BINary.pdf


I wrote a script to extract a PDF from an office bin file on Windows OS so after un TAR, Windows users can run this script. NOTE it has 2 small .exe dependencies you need to download and specify a path so see and edit start of file. For PHP you should be able to emulate that in Python so for starters see https://stackoverflow.com/a/56742848/10802527
@echo off REM dependencies are REM Didier Stevens middle.exe from https://blog.didierstevens.com/programs/binary-tools/ REM Mark Russinovich strings.exe from https://learn.microsoft.com/en-us/sysinternals/downloads/strings REM both above to be placed on path or folder e.g. set "utils=C:\Downloads\Apps\utils" setlocal enableDelayedExpansion if not exist "%~dpn1.bin" echo %0 requires a bin file to work on & pause & exit /b "%utils%\strings.exe" -o "%~1"|Findstr "%PDF-">AcroHEAD.txt set /p HEAD=<AcroHEAD.txt if [%HEAD%]==[] echo %PDF- Header not found & del Acro????.txt & pause & exit /b echo !HEAD! >AcroHEAD.txt for /f "tokens=1 delims=:" %%f in (AcroHEAD.txt) do set START=%%f "%utils%\strings.exe" -o "%~1"|Findstr "%%EOF">AcroTAIL.txt for /f "tokens=1 delims=:" %%f in (AcroTAIL.txt) do set TAIL=%%f set /a LEN=%TAIL%+6-%START% del Acro????.txt "%utils%\middle.exe" "%~1" %START% %LEN% "%~dpn1.pdf"