Optical character recognition (OCR) is a process by which DocuXplorer is able to convert scanned images of text (in a number of languages) to electronic text so that that digitized texts can be searched, indexed and retrieved within the software. Due to the demand for leading-edge OCR functionality, DocuXplorer now offers customizing OCR training.
In order to use DocuXplorer’s OCR text extraction feature, the document being added must be scanned or digitally photographed and saved in either a TIF or PDF format. Using iFilters (available as free downloads here), DocuXplorer is able to convert the patterns of light and dark found in a digital image of a page of text into text characters and save them in a format so that users can search or index in 40+ languages.
Scanning considerations that affect the accuracy of OCR include:
◾ The recommended best scanning resolution for OCR accuracy is 300 dpi. Higher resolutions do not necessarily result in better accuracy and can slow down OCR processing time. Resolutions below 300 dpi may affect the quality and accuracy of OCR results.
◾ Brightness settings that are too high or too low may adversely affect OCR accuracy. A medium brightness value of 50% will be suitable in most cases.
◾ Straightness of the initial scan can affect OCR quality; crooked lines of text produce poor results.
◾ Older and discolored documents must be scanned in RGB mode to capture all the image data, and to maximize OCR accuracy.
DocuXplorer’s OCR trainers will work with you in developing the language-specific OCR features that best fit your specific organizational needs. Training includes customizing the OCR process to ensure the highest level of accuracy and am analysis to identify structural features of the documents being added into DocuXplorer, such as text orientation, headings, images, tables, captions, and paragraphs.