How to extract text or numbers from images using python

Question

I want to extract text (mainly numbers) from images like this

I tried this code

import pytesseract from PIL import Image pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe' img = Image.open('1.jpg') text = pytesseract.image_to_string(img, lang='eng') print(text)

but all i get is this (hE PPAR)

nathancy · Accepted Answer · 2019-12-03 03:35:17Z

When performing OCR, it is important to preprocess the image so the desired text to detect is in black with the background in white. To do this, here's a simple approach using OpenCV to Otsu's threshold the image which will result in a binary image. Here's the image after preprocessing:

We use the --psm 6 configuration setting to treat the image as a uniform block of text. Here's other configuration options you can try. Result from Pytesseract

01153521976

Code

import cv2 import pytesseract pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe" image = cv2.imread('1.png', 0) thresh = cv2.threshold(image, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1] data = pytesseract.image_to_string(thresh, lang='eng',config='--psm 6') print(data) cv2.imshow('thresh', thresh) cv2.waitKey()

Collectives™ on Stack Overflow

How to extract text or numbers from images using python

1 Answer 1

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Linked

Related