pytesseract image_to_string function not accurate at all

Question

My code

for index, img in enumerate(data): # data is list of base64 decoded strings b64 = base64.b64decode(bytes(img[22:], encoding='utf-8')) raw = BytesIO(b64) im = Image.open(raw).convert('LA') pixels = im.load() width, height = im.size for x in range(width): for y in range(height): if pixels[x, y][0] > 100: pixels[x, y] = (255, 255) else: pixels[x, y] = (0, 255) print(pytesseract.image_to_string(im, config='tessedit_char_whitelist=1234567890plus?'))

My Image:

Output:
Te Ys
What I can do to make this better, I tried to use every psm from 0 to 13 and -c flag in config

help is still needed

Ozballer31
– Ozballer31

2020-08-11 05:05:46 +00:00
Commented Aug 11, 2020 at 5:05 — Ozballer31
– Ozballer31, Commented Aug 11, 2020 at 5:05
give a little padding to the image.

Tarun Chakitha
– Tarun Chakitha

2020-08-14 09:05:30 +00:00
Commented Aug 14, 2020 at 9:05 — Tarun Chakitha
– Tarun Chakitha, Commented Aug 14, 2020 at 9:05
have you tried simple thresholding?

Tarun Chakitha
– Tarun Chakitha

2020-08-14 09:14:07 +00:00
Commented Aug 14, 2020 at 9:14 — Tarun Chakitha
– Tarun Chakitha, Commented Aug 14, 2020 at 9:14

Tarun Chakitha · Accepted Answer · 2020-08-14 09:25:12Z

This code worked fine for me but spaces were not detected.

 img = ~cv2.imread("18.png",0) rows,cols = img.shape[:2] # M = np.float32([[1,0,25],[0,1,15]]) # img = cv2.warpAffine(img,M,(cols*2,rows*2),borderValue=(255,255,255)) custom_oem_psm_config = r'--oem 3 --psm 3 -c tessedit_char_whitelist="1234567890plus?"'# -c preserve_interword_spaces=1' print(pytesseract.image_to_string(img,config=custom_oem_psm_config))

Output:

18plus16?

Collectives™ on Stack Overflow

pytesseract image_to_string function not accurate at all

1 Answer 1

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Linked

Related