Working with Paper and PDF Documents

Once a document has been imported into Continia Document Capture, whether as a PDF file or as a scanned paper document, you can capture textual information in it and start working with it. The information you need to capture depends on the specific type of document, but no matter what types of documents you’re dealing with, you handle and process them in the same place – the document journal.

The main purpose of the document journal is to enable you to work with imported documents and capture textual information contained in these documents. Once the information has been captured in the document journal, you must register the document. By registering a document, you convert all captured information into a real business entity in Microsoft Dynamics 365 Business Central. For example, when you register a purchase invoice received from a vendor, you create a Business Central purchase invoice.

Important

You can configure Document Capture to work with many types of documents that are related to different records or entities in Business Central. Typically, Document Capture is used to process purchase invoices and credit memos, but it can also be used for processing sales orders, contracts, item certificates, and many other types of documents. When you process purchase invoices and credit memos, the documents are linked to vendors. For illustrative purposes, this article will focus exclusively on vendors, but please bear in mind that this serves only as an example and that it might as well have been customers, items, fixed assets, or something else.

Overview of the document journal

To access the document journal, follow these steps:

  1. Choose the Search icon, enter Document Categories, and then choose the related link.
  2. Select the code of the relevant document category – in this case PURCHASE – to open the document journal.

The document journal consists of four main sections:

Document list (left-hand side): This is in fact a table that lists all documents that are currently included in the selected journal. For each document, the table provides you with useful information such as the document number, the vendor name, the template used, and the number of pages. In addition, the OK field indicates if all captured document values are valid and whether the individual document is ready to be registered.

Document fields (left-hand side): In this section, all fields that have been identified in the currently selected document are displayed along with their corresponding values. The list of fields is determined by the document template, and for each field Document Capture indicates if the value is considered to be valid according to the configuration of each template field.

Comments (left-hand side): This section displays comments relating to the selected document. There are three different types of comments: Information, Warning, and Error. To learn more, see Configuring Comment Types and Importance.

Document image (right-hand side): The image on the right shows a visual representation of the actual scanned document (either PDF or XML) after it has been OCR-processed. For PDF documents, the document image is interactive, meaning that you can manually select text anywhere in the image to capture it as a value associated with a certain field. To learn more, see Capturing fields and Working with field captions and values.

Note

During the OCR-processing of a document, a visual representation of the document is created for display purposes, which may result in fewer colors and a lower resolution than the original document. This is done to optimize performance and usability for you when you work with the document in Business Central. What you see in the document image section is the OCR-processed version of the document – it’s not the original document, which may look slightly different. The actual OCR and recognition process is still performed on the original document at the highest level of detail. To learn more, see OCR-Processing a Document.

Changing a document’s associated vendor

By default, Document Capture links all documents to existing records in Business Central. Purchase documents are linked to vendors, meaning that each purchase document is linked to one specific vendor. In the document journal, some of the first columns of the document list – the columns Vendor and Name – show the number and the name of the vendor that each document is linked to. When importing documents, Document Capture identifies the vendor automatically based on different parameters. It’s useful to understand these parameters, so we recommend that you read the article Finding the Document Source and Template.

In the event that Document Capture doesn’t manage to identify a vendor, or in case you’d like to change the identified vendor, you can easily assign a new vendor from the document journal. To do this, follow these steps:

  1. Choose the Search icon, enter Document Categories, and then choose the related link.
  2. Select the code of the relevant document category – in this case PURCHASE – to open the document journal.
  3. In the document list, go to the line of the imported document whose vendor you want to change, and select the Vendor field. Then select the three dots that appear in order to open the Vendor window.
  4. In the Vendor window, choose the vendor that you want to assign by selecting the relevant number in the No. column.

Note

You can also set up search texts to help Document Capture identify the correct vendor. To learn more, see Finding the Document Source and Template.

Capturing fields

When capturing fields, Document Capture uses captions and values to search for and identify textual information in your imported documents. Both captions and values are identifiable text strings that are visible as textual elements in the imported documents, but they differ in function: You can think of a field caption as a sort of label that helps Document Capture to identify the correct value for a template field. Each caption is associated with a corresponding value, which is then the actual text that you want to capture and use when registering a document. In the document image on the right side of the document journal, field captions are highlighted using orange boxes, whereas field values are highlighted using blue boxes.

Once a purchase document is linked to a vendor in Business Central, Document Capture will automatically check if there’s a template associated with that vendor. If so, this template will be applied to the document, and all fields will be captured according to the rules and configuration of that template. All document fields must be captured and have valid values in order for you to be able to register the document.

If it’s the first time you receive a document from a vendor, there will be no templates associated with that vendor. To assign a template to the vendor, you’ll have to activate field recognition in the document journal (see step 5 below). This will automatically create a new template, link it to the vendor, and capture all fields for that document.

Note

If a vendor sends you a document for the first time but already has an associated template in another one of your companies, Document Capture will copy the template from that company to the current company. This means that you won’t have to repeat any configuration you’ve already carried out in another company for the same vendor.

To capture the fields of a document, follow these steps:

  1. Choose the Search icon, enter Document Categories, and then choose the related link.
  2. Select the code of the relevant document category – in this case PURCHASE – to open the document journal.
  3. If Document Capture has automatically identified a vendor for the document whose fields you want to capture, the number of the identified vendor is displayed in the Vendor field of the document list. If no vendor has been identified, assign a vendor manually as described above under Changing a document’s associated vendor.
  4. Provided that you've previously received and imported documents from the assigned vendor, the template that's associated with that vendor is used for capturing field captions and values. The identified captions and values will be highlighted using orange and blue boxes in the document image on the right.
  5. However, if this is the first document you receive and import from the assigned vendor, there's no associated template, so you must assign one by activating field recognition: In the action bar, select Process and then Recognize Fields. Field captions and values will then be captured and highlighted using orange and blue boxes in the document image.

You can always change the identified field values and captions. To learn more, see below.

Working with field captions and values

As mentioned above, Document Capture searches for and identifies textual information in imported documents using field captions and corresponding values. For example, consider the following texts which could be found in any invoice:

Text sampleExplanation
“Invoice number: 12345678”In this example, the caption is Invoice number, and the corresponding value is 12345678.
“Invoice date: 01/01/2021”Here, the caption is Invoice date, whereas the value is 01/01/2021.

As is evident in the above, captions and values always work in pairs. Usually, captions don’t change from document to document when the documents have been sent by the same vendor, unless the vendor changes invoice layout. Values, on the other hand, will change from document to document. For example, the invoice number will typically increase incrementally for each invoice, and the same usually goes for the invoice date.

Captions are defined for each template field, and the values are captured and stored in the document itself. It’s important to note that in order for captions and values to be captured correctly, the distance between them should remain the same in all documents from the same vendor. For instance, in the examples above, the caption immediately precedes its corresponding value at a certain distance, and if the two elements have the same position and distance between them in all subsequent documents, they will all be processed correctly. Captions and values can also be split into separate lines, for example if an Invoice number caption is placed on one line and its corresponding value then follows immediately below it on the next line (a common invoice format). This is also perfectly fine, as long as the distance between the caption and the value is the same for all documents from the same vendor. Even captions that are placed far away from their related values are completely acceptable, provided that the distance between the two remains the same in all documents.

When Document Capture identifies the values of fields in a document, it searches for captions that are defined for each template field and then attempts to capture their associated values. For example, if Document Capture searches for the text string “Invoice number” in an invoice and manages to find this, it can then locate the actual invoice number as well based on the template configuration.

Document Capture sometimes fails to capture the value of a field in a document, or it may identify an incorrect element as the field’s associated value. If this happens, you can easily train Document Capture by changing the caption and/or showing it where exactly to locate the correct value in the document. To do this, follow these steps:

  1. Choose the Search icon, enter Document Categories, and then choose the related link.
  2. Select the code of the relevant document category – in this case PURCHASE – to open the document journal.
  3. In the document fields section (under Document Header), select the template field that you want to correct – for example, Posting Description.
  4. On a computer, you can use your mouse to select exactly what text in the document should be captured for the chosen template field: To set the caption, right-click and hold the button to draw an orange box around the relevant text in the document image.
  5. Similarly, to set the corresponding value for the chosen template field, left-click and hold the button to draw a blue box around the relevant text in the document image.
  6. The selected text is added to the template field in the document fields section. Verify that it has been captured correctly.
  7. To confirm that the text is consistently captured correctly, go to the action bar and select Process and then Recognize Fields. If the text somehow isn't captured correctly this second time, carry out steps 4-7 again until you get the desired result.

Adding and removing template fields

You can easily capture more information than what’s captured by the default templates. To do so, either add one of the additional fields that are included in the standard configuration, or create your own custom field.

To add an additional field from the standard configuration, follow these steps:

  1. Choose the Search icon, enter Document Categories, and then choose the related link.
  2. Select the code of the relevant document category – in this case PURCHASE – to open the document journal.
  3. In the action bar, select Template and then Add Template Field to open the Template Field List.
  4. In the list, under Field Name, select the field that you want to add to the template.

The Template Field List is closed, and the selected field is added to the list of template fields in the document fields section (under Document Header).

Note

By following the instructions above, you’re only adding the selected field to the template that’s assigned to the current document. If there are fields that you’d like to be captured in all or most of your documents, you should add these fields to the master template. If you do this, newly created templates will automatically have these fields available. To learn more, see Working with Templates.

If you need other fields than the ones included in the standard configuration, you can create a custom field. To learn more, see Setting up New Template Fields.

Just as you can add new fields, you can easily remove fields from a template. To remove a field from a template, follow these steps:

  1. Choose the Search icon, enter Document Categories, and then choose the related link.
  2. Select the code of the relevant document category – in this case PURCHASE – to open the document journal.
  3. In the action bar, select Template and then Remove Template Field to open the Template Field List.
  4. In the list, under Field Name, select the field that you want to remove from the template.
  5. A dialog box appears, asking you if you want to remove the chosen field from the template. Select Yes.

The dialog box is closed, and the selected field is removed from the list of template fields in the document fields section (under Document Header).

Showing registered and rejected documents

When you open the document journal, it will be filtered to show only open documents. However, you can also easily filter it to display documents that have been registered or rejected. To do this, follow these steps:

  1. Choose the Search icon, enter Document Categories, and then choose the related link.
  2. Select the code of the relevant document category – in this case PURCHASE – to open the document journal.
  3. In the upper-right corner, above the action bar, go to Status Filter and select the box to display the filter options. The default option is Open, meaning that only documents with the status "Open" are displayed in the document list. Change this as needed by selecting your preferred filter option.

The document list will be updated to display only documents with the status you selected. If you selected All, all documents are displayed in the list, regardless of status.

See also

Configuring Comment Types and Importance
OCR-Processing a Document
Finding the Document Source and Template
Registering Documents
Working with Templates
Setting up New Template Fields