Example to train a Document Extraction AI#

1. Define your data model#

First, you should draft a data model. This should look like the following.

img_10.png

You can use a software called Markmap which provides a handy and easy-to-use interface to create those charts. Have a look in the Appendix to access the Data Model Financial Accounting.

The data model above is very dense and consistent. This provides the option to extract 26 fields from three documents types: Invoices, Delivery Notes, and Payment Advices with just 12 Labels. To make this possible Konfuzio allows reusing Labels across Label Sets and Label Sets across Categories. In addition, it can extract multiple line items in Invoices or Delivery Notes and invoices from Payment Advices by defining only two Label Sets.

This data model will help to set up the project.

2. Project Set-up#

  1. Register

  2. Create a Project

  3. Add the Document Categories as defined in the data model.

  4. Add the Label Sets as defined in the data model.

  5. Add the Labels as defined in the data model.

3. Add documents#

  1. Upload new Documents

  2. Add those documents to the Status: Preparation

  3. If your Project contains more than one category, go to the list of Documents and assign the correct category.

4. Annotate Documents#

  1. Create Human annotations

  2. Add every 10th document to Status: Test documents all others add to Status: Training documents

5. Train Document AI#

  1. Train one Extraction AI per Category and Categorization AI for the project

  2. Wait for the E-Mails to arrive.

  3. Find the evaluation measured on the Test Documents by clicking on the Extraction AI.

  4. Use the Integrations & API

  5. Continuously Retrain Extraction AI or Retrain categorization AI.

Appendix#

Data Model Financial Accounting#

Copy the following text to create replicate the chart. img_9.png

# Project: Financial Accounting

## Category 1: Invoice

## Label Set 1: Debtor

- Label 1: Company Name
- Label 2: Street
- Label 3: House Number
- Label 4: CIP Code
- Label 5: Place

## Label Set 2: Item

- Label 6: Description
- Label 7: Unit Price
- Label 8: Total Price
- Label 9: Quantity

## Category 2: Delivery Note

## Label Set 3: Vendor

- Label 1: Company Name
- Label 2: Street
- Label 3: House Number
- Label 4: CIP Code
- Label 5: Place

## Label Set 2: Item

- Label 6: Description
- Label 7: Unit Price
- Label 8: Total Price
- Label 9: Quantity

## Category 3: Payment Advice

## Label Set 1: Debtor

- Label 1: Company Name
- Label 2: Street
- Label 3: House Number
- Label 4: CIP Code
- Label 5: Place

## Label Set 3: Invoice as a line item

- Label 10: Number
- Label 11: Date
- Label 12: Total Amount