Extract text from a binary file/image/other text formats
The docker image uses /data folder as a volume where document will be read/written. Hence the user needs to provide the folder that would be mapped to /data
For example, Download BookReporter.pdf file to the Downloads folder of your home directory (~/Downloads)
To extract text from BookReporter.pdf and save it to file BookReporter.txt, run
docker run \ --rm \ -v "`pwd`:/data" \ kunalshah/textract:latest \ -o converted.txt \ file.pdf See converted text file
cat converted.txt Read here
Read here