Optical Character Recognition Services

Convert scanned pages of text into electronic text that can be searched, indexed, and retrieved within DocuXplorer.

Optical Character Recognition (OCR) is a process by which DocuXplorer is able to convert scanned images of text (in over 100 languages) to electronic text so that digitized texts can be searched, indexed, and retrieved within the software.  Due to the demand for cutting-edge OCR functionality, DocuXplorer now offers customized OCR training.

DocuXplorer’s OCR trainers will work with you in developing the language-specific OCR features that best fit your specific organizational needs. Training includes customizing the OCR process to ensure the highest level of accuracy and an analysis to identify structural features, such as text orientation, headings, images, tables, captions, and paragraphs.

To use DocuXplorer’s OCR text extraction feature, the document being added must be scanned or digitally photographed and saved in either a TIFF or PDF format. Using iFilters (available as free downloads), DocuXplorer is able to convert the patterns of light and dark found in a digital image of a page of text into saved, searchable, indexed text in over 100 languages. 

Scanning considerations that affect the accuracy of OCR include:

  • The recommended best scanning resolution for OCR accuracy is 300 dpi. Higher resolutions do not necessarily result in better accuracy and can slow down OCR processing time. Resolutions below 300 dpi may affect the quality and accuracy of OCR results.
  • Brightness settings that are too high or too low may adversely affect OCR accuracy. A medium brightness value of 50% will be suitable in most cases.
  • Straightness of the initial scan can affect OCR quality; crooked lines of text can produce poor results.
  • Older and discolored documents must be scanned in RGB mode to capture all the image data, and to maximize OCR accuracy.

