Before throwing the image into Tesseract OCR, it's important to preprocess the image to remove noise and smooth the text. Here's a simple approach using OpenCV
- Convert image to grayscale
- Otsu's threshold to obtain binary image
- Gaussian blur and invert image
After converting to grayscale, we Otsu's threshold to get a binary image

From here we give it a slight blur and invert the image to get our result

Results from Pytesseract
Certificate No. : IN-KA047969602415880
import cv2 import pytesseract pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe" image = cv2.imread('1.png',0) thresh = cv2.threshold(image, 0, 255, cv2.THRESH_OTSU + cv2.THRESH_BINARY_INV)[1] blur = cv2.GaussianBlur(thresh, (3,3), 0) result = 255 - blur data = pytesseract.image_to_string(result, lang='eng', config='--psm 6') print(data) cv2.imshow('thresh', thresh) cv2.imshow('result', result) cv2.waitKey()