0

I'm having some difficulty detecting text on the following type of image:

Image without preprocessing

It seems that tesseract has difficulty distinguishing the numbers from the diagrams. And my goal is to find every digits and their location.

From this image I run the following code which is supposed to give me rectangles around text found :

import cv2 import pytesseract from pytesseract import Output import numpy as np pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract' img = cv2.imread('Temp/VE_cropped.png') kernel = np.ones((2,2),np.uint8) img_processed = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) img_processed = cv2.medianBlur(img_processed,3) img_processed = cv2.threshold(img_processed, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1] img_processed = cv2.dilate(img_processed, kernel, iterations = 1) dict_wordsDetected = pytesseract.image_to_data(img_processed, output_type=Output.DICT) img_processed = cv2.cvtColor(img_processed, cv2.COLOR_GRAY2RGB) n_boxes = len(dict_wordsDetected['text']) for i in range(n_boxes): (x, y, w, h) = (dict_wordsDetected['left'][i] , dict_wordsDetected['top'][i] , dict_wordsDetected['width'][i] , dict_wordsDetected['height'][i]) img_processed = cv2.rectangle(img_processed, (x - 10, y - 10), (x + w + 10, y + h + 10), (0, 0, 255), 2) cv2.imshow("processed", img_processed) cv2.waitKey(0) 

What gives us this result : Result

7
  • 1
    It does, but even with black on white. You just have to add : img_processed = cv2.threshold(img_processed, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1] AFTER img_processed = cv2.dilate(img_processed, kernel, iterations = 1) Commented Nov 17, 2021 at 15:44
  • It won't have any effect except moving rectangles around found text... Commented Nov 17, 2021 at 17:34
  • As I told, my problem is that tesseract doesn't recognize digits such as 0455, 0435 or 0453. The command you suggest is just to resize the red rectangles, but my problem is before drawing the rectangles. Commented Nov 17, 2021 at 17:49
  • Sorry my post is very clear, and needed these illustation. This is the kind of image I have to work with, so why not showing it ? Commented Nov 18, 2021 at 8:22
  • The box are supposed to be around the numbers. This is just to show that tesseract doesn't find numbers. This is why I ask for help here : to make tesseract find numbers, and I'll be able to find coordinates Commented Nov 18, 2021 at 8:49

1 Answer 1

3

I think that I understood what you wanted. First of all, Tesseract works well for many problems, especially when we see examples with images that are easily OCR'ed. That means, images without a complex background. In your case, the image is not simple enough to be treated using just Tesseract or image thresholding. You must do more image preprocessing to OCR your image. To solve your problem, you must clean your image, trying to obtain just the numbers. It can be hard work.

Recently, I was looking for a code to apply OCR to an image with a complex background. I found some solutions. The code that I'll show you is based on this solution.

To extract the number (or try), you must follow some steps

  • convert your image into the gray scale
  • apply image threshold using Otsu method and inverse operation
  • apply distance transform
  • apply morphological operation to clean up small points in your image
  • apply dilate operation to enlarge your numbers
  • find contours and filter them according the width and height of each contours
  • create a list of hull objects to each contour
  • draw the hull objects
  • using dilate operation in your mask
  • bitwise operation to retrieval the the segmented areas
  • OCR the pre-processed image
  • print out your results

The code that I present here is not perfect and, I think that it can be improved, but I want to show you a start point for your problem resolution.

import cv2 import pytesseract from pytesseract import Output import numpy as np import imutils # loading and resizing image img = cv2.imread('ABV5H.png') img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) img = imutils.resize(img, width=900) #gray scale gray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY) cv2.imshow("Gray", gray) cv2.waitKey(0) cv2.destroyAllWindows() # thresholding with Otsu method and inverse operation thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)[1] cv2.imshow("Threshold", thresh) cv2.waitKey(0) cv2.destroyAllWindows() #distrance transform dist = cv2.distanceTransform(thresh, cv2.DIST_L2, 5) dist = cv2.normalize(dist, dist, 0, 1.0, cv2.NORM_MINMAX) dist = (dist*255).astype('uint8') dist = cv2.threshold(dist, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1] cv2.imshow("Distance Transformation", dist) cv2.waitKey(0) cv2.destroyAllWindows() # Morphological operation kernel (2,2) and OPEN method kernel = cv2.getStructuringElement(cv2.MORPH_CROSS, (2,2)) opening = cv2.morphologyEx(dist, cv2.MORPH_OPEN, kernel) cv2.imshow("Morphology", opening) cv2.imwrite("morphology.jpg", opening) cv2.waitKey(0) cv2.destroyAllWindows() #dilate operation to enlarge the numbers kernel = cv2.getStructuringElement(cv2.MORPH_CROSS, (3,3)) dilation = cv2.dilate(opening, kernel, iterations = 1) cv2.imshow("dilated", dilation) cv2.imwrite("dilated.jpg", dilation) cv2.waitKey(0) cv2.destroyAllWindows() #finding and grabbing the contours cnts = cv2.findContours(dilation.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) cnts = imutils.grab_contours(cnts) output = img.copy() for i in cnts: cv2.drawContours(output, [i], -1, (0, 0, 255), 3) cv2.imshow("Contours", output) cv2.imwrite("contours.jpg", dilation) cv2.waitKey(0) cv2.destroyAllWindows() #filtering the contours nums = [] output2 = img.copy() for c in cnts: (x, y, w, h) = cv2.boundingRect(c) if w >= 5 and w < 75 and h > 15 and h <= 35: nums.append(c) for i in nums: cv2.drawContours(output2, [i], -1, (0, 0, 255), 2) cv2.imshow("Filter", output2) cv2.imwrite("filter.jpg", output2) cv2.waitKey(0) cv2.destroyAllWindows() # making a list with the hull points hull = [] # calculate points for each contour for i in range(len(nums)): # creating convex hull object for each contour hull.append(cv2.convexHull(nums[i], False)) # create an empty black image mask = np.zeros(dilation.shape[:2], dtype='uint8') # draw contours and hull points for i in range(len(nums)): color = (255, 0, 0) # blue - color for convex hull # draw ith convex hull object cv2.drawContours(mask, hull, i, color, 1, 8) #dilating the mask to have a proper image for bitwise mask = cv2.dilate(mask, kernel, iterations = 15) cv2.imshow("Dilated Mask", mask) cv2.imwrite("dilated-mask.jpg", mask) cv2.waitKey(0) cv2.destroyAllWindows() #bitwise operation final = cv2.bitwise_and(dilation, dilation, mask=mask) cv2.imshow("Pre-processed Image", final) cv2.imwrite("pre-processed.jpg", final) cv2.waitKey(0) cv2.destroyAllWindows() config = '--psm 12 -c tessedit_char_whitelist=0123456789' #page segmentation mode and white lists #OCR'ing the image dict_wordsDetected = pytesseract.image_to_data(final, config = config, output_type=Output.DICT) #filtering the detections and making a list of index index = [] for idx, txt in enumerate(dict_wordsDetected['text']): if len(txt) >= 1: dict_wordsDetected['text'][idx] = txt.replace(" ", "") index.append(idx) for i in index: (x, y, w, h) = (dict_wordsDetected['left'][i] , dict_wordsDetected['top'][i] , dict_wordsDetected['width'][i] , dict_wordsDetected['height'][i]) img_processed = cv2.rectangle(img, (x - 10, y - 10), (x + w + 10, y + h + 10), (0, 0, 255), 2) text = "{}".format(dict_wordsDetected['text'][i]) cv2.putText(img, text, (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.8, (0, 0, 255), 2) cv2.imshow("Voilà le résultat", img) cv2.imwrite('result.jpg', img) cv2.waitKey(0) cv2.destroyAllWindows() 

Visualizing some operations

(I cannot upload my images for the moment. There are some hyperlinks with images. These images correspond to some image pre-processing steps)

Output image after dilation:

Output image after dilation

filtered contours:

filtered contours

Mask after the hull operation and dilation: Mask after the hull operation and dilation

pre-processed image (the image that will be OCR'ed:

pre-processed image (the image that will be OCR'ed)

The results Results

Results

As you can see, we can find numbers in the input image. We have good detection. On the other hand, we also have inaccurate outputs. The main reason is the image preprocessing. The image is noisy, even if we have performed some transformations. The key to your problem is image preprocessing. Another point you must keep in mind is that Tesseract is not perfect; it requires good images to work well. Beyond that, you must know the --psm modes (page segmentation) to improve your OCR, as well as using white lists to avoid undesirable detection. As I said, we have good results, but I guess you can improve them if your task requires just OpenCV and Tesseract. Because there are others that are way less complicated than this one.

Si tu as besoin d'aide, tu peux me contacter, je préfère parler français que l'anglais.

Sign up to request clarification or add additional context in comments.

2 Comments

Merci beaucoup pour ton aide! En effet j'aimerai bien pouvoir échanger avec toi sur ces pratiques, mais impossible d'envoyer de messages direct depuis StackOverFlow. Souhaites-tu que je t'envoie mon Linkedin?
De rien. Je suis en conversation vers l’intelligence artificielle, connaître ce genre de technique est fondamental pour travailler avec Computer Vision. Donc on peut se contacter pour quelques échanges, c’est toujours important de parler, on apprend beaucoup.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.