4

I am trying to detect some numbers with tesseract in python. Below you will find my starting image and what I can get it down to. Here is the code I used to get it there.

import pytesseract import cv2 import numpy as np pytesseract.pytesseract.tesseract_cmd = "C:\\Users\\choll\\AppData\\Local\\Programs\\Tesseract-OCR\\tesseract.exe" image = cv2.imread(r'64normalwart.png') lower = np.array([254, 254, 254]) upper = np.array([255, 255, 255]) image = cv2.inRange(image, lower, upper) image = cv2.bitwise_not(image) #Uses a language that should work with minecraft text, I have tried with and without, no luck text = pytesseract.image_to_string(image, lang='mc') print(text) cv2.imwrite("Wartthreshnew.jpg", image) cv2.imshow("Image", image) cv2.waitKey(0) 

I end up with black numbers on a white background which seems pretty good but tesseract can still not detect the numbers. I also noticed the numbers were pretty jagged but I don't know how to fix that. Does anyone have recommendations for how I could make tesseract be able to recognize these numbers?

Starting Image

What I end up with

4
  • 1
    You could try cv2.blur() to smooth the rough edges of the numbers. It will make the image fuzzier overall but tesseract might have an easier time recognizing digits. Commented Jul 28, 2021 at 14:24
  • Thanks for the suggestion, the image might be too small but it still cant see it. Commented Jul 28, 2021 at 14:33
  • Try to add config psm 6 or 7 like this: pytesseract.image_to_string(img, config='--psm 6') Commented Jul 29, 2021 at 1:42
  • Good idea. The solution I found was to use --psm 8 and treat it as a word along with limiting it to numbers. stackoverflow.com/questions/44619077/… Was a useful resource for anyone in the future who sees this. Commented Jul 29, 2021 at 14:20

3 Answers 3

4

Your problem is with the page segmentation mode. Tesseract segments every image in a different way. When you don't choose an appropriate PSM, it goes for mode 3, which is automatic and might not be suitable for your case. I've just tried your image and it works perfectly with PSM 6.

df = pytesseract.image_to_string(np.array(image),lang='eng', config='--psm 6') 

These are all PSMs availabe at this moment:

 0 Orientation and script detection (OSD) only. 1 Automatic page segmentation with OSD. 2 Automatic page segmentation, but no OSD, or OCR. 3 Fully automatic page segmentation, but no OSD. (Default) 4 Assume a single column of text of variable sizes. 5 Assume a single uniform block of vertically aligned text. 6 Assume a single uniform block of text. 7 Treat the image as a single text line. 8 Treat the image as a single word. 9 Treat the image as a single word in a circle. 10 Treat the image as a single character. 11 Sparse text. Find as much text as possible in no particular order. 12 Sparse text with OSD. 13 Raw line. Treat the image as a single text line, bypassing hacks that are Tesseract-specific. 
Sign up to request clarification or add additional context in comments.

Comments

0

Use the pytesseract.image_to_string(img, config='--psm 8') or try diffrent configs to see if the image will get recognized. Useful link here Pytesseract OCR multiple config options

Comments

0

I think tesseract is blacklisted numbers by default, so i tried tessedit_char_whitelist to whitelist the characters i want but it didn't work, so i tried to un-blacklist the numbers using this config tessedit_char_unblacklist='0123456789'

pytesseract.image_to_string(img, lang='eng', config='--psm 6 --oem 3 -c tessedit_char_unblacklist=0123456789') 

1 Comment

Remember that Stack Overflow isn't just intended to solve the immediate problem, but also to help future readers find solutions to similar problems, which requires understanding the underlying code. This is especially important for members of our community who are beginners, and not familiar with the syntax. Given that, can you edit your answer to include an explanation of what you're doing and why you believe it is the best approach?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.