Multi Context Extraction

In this tutorial, we will extract table-like structures. Here, it is important that we teach the AI to group Labels. We use line items in invoices as an example.

Create project

We use the same project as in the first tutorial. If you want to create a new one, you can check there again how to create a project.

Creating a new Label Sets

graph LR subgraph Users in Project User --- Project end subgraph Category Project --- Invoice end subgraph Label Set Invoice --- Item[Line Items*] end subgraph Label Item --- Amount Item --- Unit[Unit Name] Item --- Price[Unit Price] Item --- Subtotal Item --- VAT-Code end

* Use the multi Label Set option when creating the Label Set.

Click HOME > Labels > +Add and add your labels there. In our example these are: “Quantity”, “Description”, “Unit price”, “Subtotal” and “VAT code”.

Creating a Label Set

A Label Set is a group of labels that are logically related to each other. They are therefore the abstract Label Set for the sections. Click HOME > Label Sets > +Add to create a new Label Set. Name your Label Set (here: “Individual services”). Select the associated project (here: “Receipts”). Check the box “Has multiple Sections”. Then click “Save and continue editing” to get to the next step. Here you can add the labels you just created to the Label Set using the arrow keys. Click on “Save” to save the Label Set.

Create training data

Sections are groups of related information in a document. They are the concrete manifestations of the Label Sets. In our example, the first section contains all information of the first product, i.e. the top line or the first individual service of the receipt.
To label the first section, we create an annotation that belongs to the first section. After clicking on the right entity, we can define the properties of the annotation in the annotation bar on the right side using two tabs. In the upper tab, we select the Label Set that corresponds to the section and in the lower tab, we select the label that should be assigned to the entity.
We select “Single Service (New)” at the top and “Number” at the bottom. We then label the rest of the section, with the first section now being displayed as “Single Service”. We repeat this for the next sections. They will then be listed in the tab numbered from top to bottom. To create an additional section, select “Single Service (New)”.

We repeat this process for all training documents. Create your training data according to our example. Due to the diversity of the application area, differences may occur. For example, sections do not always have to correspond to rows.

Reviewing the training data

You can verify correct labels as they are displayed above the annotations. However, it is equally important for the learning success of the AI to verify that the labels are assigned to the correct sections. To do this, you can do the following:
In the upper right corner of the annotation bar, select the first section in the “Sections” tab under Filter (here: “ Individual Performance”). Now only the labels of the first section should be visible. Most of the time you can see at a glance if they are correct (Here: If all labels are in one row). If you see an error, you can use “Edit” in the annotation bar to fix it. (Tip: You should also use this method when checking the results of the AI).

Evaluate results and give feedback

You can see how to split your documents into a training and test data set and train the AI in the first tutorial. There you will also see how to give feedback to the AI.

Export results

See Integrations & API