Clean image for OCR

Question

I've been trying to clean this image for OCR but getting mixed results:

Best I achieved:

def image_smoothening(img): ret1, th1 = cv2.threshold(img, 180, 255, cv2.THRESH_BINARY) ret2, th2 = cv2.threshold(th1, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU) blur = cv2.GaussianBlur(th2, (1, 1), 0) ret3, th3 = cv2.threshold( blur, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU) return th3 def remove_noise_and_smooth(img): filtered = cv2.adaptiveThreshold(img.astype( np.uint8), 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 45, 3) kernel = np.ones((1, 1), np.uint8) opening = cv2.morphologyEx(filtered, cv2.MORPH_OPEN, kernel) closing = cv2.morphologyEx(opening, cv2.MORPH_CLOSE, kernel) img = image_smoothening(img) or_image = cv2.bitwise_or(img, closing) return or_image

Any clue as to what I'm missing?

You could binarize the image with a very low threshold. Then perform region labelling on the black areas. Remove the largest labelled region. The remaining mask should be all the characters. — Karson
– Karson, Commented Jul 26, 2020 at 13:46
@Karson Thanks, I tried things and got to the same conclusion. How would I go about labeling the black areas? Finding contours is no trouble, but how do you check for their color? — L14n
– L14n, Commented Jul 29, 2020 at 19:33
you could implement your own region growing algorithm or you could use a library. A quick search for Python examples shows scikit-image.org/docs/dev/api/… which I believe would do the trick. You need to ensure that you binarize the image before trying to label. — Karson
– Karson, Commented Jul 30, 2020 at 21:31

Karson · Accepted Answer · 2020-07-30 23:00:09Z

My MATLAB code to solve it. I know you are writing in Python so you'll have to translate.

%Read in im = imread('DuQy7.png'); %Convert to grayscale img = rgb2gray(im); img = rescale(img); %Binarize with threshold of 0.7/1.0 imbw = imbinarize(img,0.7/1); %Flip blacks/whites imbw = imcomplement(imbw); %Label, L is labelled image, n is # of labels [L,n] = bwlabeln(imbw); count = zeros(n,1); [y,x] = size(L); %Get count for each label L = uint8(L); for j=1:y for i=1:x if L(j,i) ~= 0 count(L(j,i)) = count(L(j,i)) + 1; end end end %Find label with most values in image max = 0; maxi = 1; for index=1:n if max < count(index) max = count(index); maxi = index; end end %Replace large region and color other labels to white for j=1:y for i=1:x if L(j,i) == maxi L(j,i) = 0; elseif L(j,i) ~= 0 L(j,i) = 256; end end end %view and save imshow(L) imwrite(L,'outputTXT.bmp');

You could probably better adjust the threshold to better cut out background regions that got included. You could also look for labelled regions that are very small and remove them since they are probably erroneously included.

Some parts of the background are going to be impossible to get rid of since they are indistinguishable from the actual symbols. For example, between symbol x2,y1 and x2,y2 there is a black background region between the outlined white which is the same value as the symbols. Therefore it would be very difficult to parse out.

@L14n the labelling step is set up to remove the background gradient.

fmw42 · Accepted Answer · 2020-07-25 17:13:07Z

You can do "division normalization" in Python/OpenCV to remove the background. But that will not help with the outline font issue.

Input:

import cv2 import numpy as np # read the image img = cv2.imread('img.png') # convert to gray gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY) # apply morphology kernel = cv2.getStructuringElement(cv2.MORPH_RECT , (3,3)) smooth = cv2.morphologyEx(gray, cv2.MORPH_DILATE, kernel) # alternate blur in place of morphology #smooth = cv2.GaussianBlur(gray, (15,15), 0) # divide gray by morphology image division = cv2.divide(gray, smooth, scale=255) # threshold result = cv2.threshold(division, 0, 255, cv2.THRESH_OTSU )[1] # save results cv2.imwrite('img_thresh.png',result) # show results cv2.imshow('smooth', smooth) cv2.imshow('division', division) cv2.imshow('result', result) cv2.waitKey(0) cv2.destroyAllWindows()

Result:

Collectives™ on Stack Overflow

Clean image for OCR

2 Answers 2

2 Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Related