Revisions to Make existing PDF searchable ( OCR ) via command line / script

replaced http://stackoverflow.com/ with https://stackoverflow.com/

edited May 23, 2017 at 12:40

1

Stackoverflow has related questions under PDF-parsing PDF-parsing covering things such as PDFBox and Apache's TIKA that the PDFBox uses. The ruby code below extracts writing from PDF. You need to have good enough resolution for this type of codes to work robustly. So get a good enough scanner with large resolution and then see if some of the softwares work.

Examples

https://github.com/yob/pdf-reader/tree/master/examples

SO threads

[Edit]

I am not sure whether I understood your problem now. You want to add OCR layer to different kinds of material such as random photos, screenshots, PDFs without OCR layer and so on? I don't know the solution but I am sure someone knows so asked a specific question how to do it with Automator and some OCR software:

Automator-script with an OCR-software to automatically add OCR to material?

replaced http://apple.stackexchange.com/ with https://apple.stackexchange.com/

Source Link

edited Apr 13, 2017 at 12:45

Community Bot

1

Stackoverflow has related questions under PDF-parsing covering things such as PDFBox and Apache's TIKA that the PDFBox uses. The ruby code below extracts writing from PDF. You need to have good enough resolution for this type of codes to work robustly. So get a good enough scanner with large resolution and then see if some of the softwares work.

Examples

https://github.com/yob/pdf-reader/tree/master/examples

SO threads

[Edit]

I am not sure whether I understood your problem now. You want to add OCR layer to different kinds of material such as random photos, screenshots, PDFs without OCR layer and so on? I don't know the solution but I am sure someone knows so asked a specific question how to do it with Automator and some OCR software:

Automator-script with an OCR-software to automatically add OCR to material?Automator-script with an OCR-software to automatically add OCR to material?

added 457 characters in body

Source Link

edited Mar 11, 2013 at 4:29

hhh

3.9k
24
60
88

Stackoverflow has related questions under PDF-parsing covering things such as PDFBox and Apache's TIKA that the PDFBox uses. The ruby code below extracts writing from PDF. You need to have good enough resolution for this type of codes to work robustly. So get a good enough scanner with large resolution and then see if some of the softwares work.

Examples

https://github.com/yob/pdf-reader/tree/master/examples

SO threads

[Edit]

I am not sure whether I understood your problem now. You want to add OCR layer to different kinds of material such as random photos, screenshots, PDFs without OCR layer and so on? I don't know the solution but I am sure someone knows so asked a specific question how to do it with Automator and some OCR software:

Automator-script with an OCR-software to automatically add OCR to material?

Source Link

answered Mar 10, 2013 at 18:57

hhh

3.9k
24
60
88

Loading

Stack Exchange Network

Return to Answer