Get orientation pytesseract Python3

Question

I want to get the orientation of a scanned document. I saw this post Pytesseract OCR multiple config options and I tried to use --psm 0 to get the orientation.

target = pytesseract.image_to_string(text, lang='eng', boxes=False, \ config='--psm 0 tessedit_char_whitelist=0123456789abcdefghijklmnopqrstuvwxyz')

But I get an error:

FileNotFoundError: [Errno 2] No such file or directory: '/var/folders/jy/np7p4twj4bx_k396hyc_bnxw0000gn/T/tess_dzgtpadd_out.txt'

lads · Accepted Answer · 2018-08-14 14:28:41Z

11

I found another way to get the orientation using pytesseract:

print(pytesseract.image_to_osd(Image.open(file_name)))

This is the output:

Page number: 0 Orientation in degrees: 270 Rotate: 90 Orientation confidence: 21.27 Script: Latin Script confidence: 4.14

answered Aug 14, 2018 at 14:28

lads

1,1953 gold badges15 silver badges29 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

alyssaeliyah Over a year ago

It can detect script or font? What if the document contains different font?

arun Over a year ago

This is a good solution, but found that it's not very accurate. In a small experiment I did on 9 rotated (right, left, down) PNG document pages, it detected the rotation correctly on only 6.

Mahesh Kumaran · Accepted Answer · 2020-02-12 12:57:11Z

Instead of writing regex to get the output from a string , pass the parameter Output.DICT to get the result as a dict

from pytesseract import Output im = cv2.imread(str(imPath), cv2.IMREAD_COLOR) newdata=pytesseract.image_to_osd(im, output_type=Output.DICT)

The sample output looks as follows: Use the dict keys to access the values

{ 'page_num': 0, 'orientation': 90, 'rotate': 270, 'orientation_conf': 1.2, 'script': 'Latin', 'script_conf': 1.11 }

Mousam Singh · Accepted Answer · 2019-03-12 13:36:50Z

@lads has already mentioned the method whic can find orientation. I have just used re to get by how much degree do we need to rotate the image.

imPath='path_to_image' im = cv2.imread(str(imPath), cv2.IMREAD_COLOR) newdata=pytesseract.image_to_osd(im) re.search('(?<=Rotate: )\d+', newdata).group(0)

Collectives™ on Stack Overflow

Get orientation pytesseract Python3

3 Answers 3

2 Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

Comments

Linked

Related