Configuring Advanced OCR Options

All PDF files imported into Continia Document Capture are OCR-processed in accordance with the settings that are applicable at the time of processing. The default settings will apply if you make no adjustments, but it is indeed possible to customize some of the more advanced settings to suit your needs. To find out how, see the sections below.

To configure OCR settings

You can configure the way incoming documents are OCR-processed by following these steps:

  1. Choose the Search icon, enter Document Categories, and then choose the related link.
  2. Open the relevant document category. For example, to open the purchase document category, select the PURCHASE line (not the PURCHASE code itself), and then select Edit in the action bar.
  3. On the OCR Processing FastTab, configure the settings as needed. For more information and recommendations, see Details and recommended settings below.

The table below contains a number of tips and recommendations for each of the fields you can customize using the above guide:

FieldDetails and recommendations
Image ResolutionIn this field, you can enter the number of dots per inch (DPI) to be used by Document Capture when storing OCR-processed files as image files. The entered value must be at least 150 DPI – anything below this will return an error.

The higher the entered value, the better the resolution. However, note that very high values will result in correspondingly large image files that take a long time to load in the user interface. For this reason, we recommend that you select 300 DPI, which ensures good resolutions and acceptable file sizes.
Image Colour ModeHere, you can specifiy the color mode of the image files that all imported PDF files are converted into. You can choose between the following options:
  • Black & White: Image files consisting of only black and white colors will be created and stored.
  • Gray: Image files consisting of only grayscale tones will be created and stored.
  • Colour: Image files consisting of all original colors will be created and stored.
We advise against selecting the Colour option, as this will increase the size of the stored image files and consequently slow down image rendering.
Max. number of pages to process per fileThis field allows you to specify how many pages should be OCR-processed for each imported file, enabling you to reduce the import time and thereby optimize the import process. The last three pages of any imported file is always processed, as they typically contain essential information.

Note that Document Capture imposes an overall limit of 500 pages on document import, meaning that no documents longer than 500 pages can be imported into Document Capture, regardless of what value you enter in this field.
OCR LanguagesIn this field, you can add all the languages whose character sets should be recognized by Document Capture when OCR-processing incoming documents.

We recommend that you limit the number of activated languages to the ones generally used in the documents you import (typically only your own native language and, if relevant, English), as enabling too many languages is likely to lower the overall quality of character recognition.
Process PDF files with XML filesWith this toggle, you can enable the import of XML files embedded in PDFs (such as ZUGFeRD, Factur-X, and XRechnung). For more information, see Enabling the Import of PDF Files with Embedded XML Files (ZUGFeRD, XRechnung).

For information on the remaining customizable fields on the OCR Processing FastTab, which all relate to the automatic splitting of documents, see Splitting documents automatically.

See also

Splitting documents automatically