Tesseract

Tesseract

It was developed by Hewlett Packard, then by Google, and is distributed under the Apache Licence 2.0. 

Open source software library for text recognition — OCR (Optical character recognition). 

Clients: Google, Intel, Mail.ru Group (ok.ru, youla.ru, city-mobil.ru, myTarget, ICQ, etc). 

Site 

Digital platforms: Cross-platform software

Versions: Cloud/On-Premise 

Use cases

  • Text recognition for anti-spam protection of mail.ru services

Solution: the Tesseract-based algorithm for recognizing spam in text and images.

Result: the solution helped create a low-cost anti-spam solution for Mail.ru Group services, which helped to save users from a large number of annoying ads.

Benefits

Efficient optical character recognition (OCR) engine for a variety of operating systems, supporting TIFF, PNG, JPEG, JP2, WebP, etc.