Annotations

An annotation refers to a character, word or paragraph extracted from a document.

Human annotations

Annotations assign text and visual information in a document to a business context. When you create an annotation in the SmartView, you assign the business context by using the label set and label.

You can create an annotation by clicking and dragging the cursor over a rectangular area you want to annotate. When you save the annotation, Konfuzio will recognize the text within the selected box. When you click edit again, you will see the red box which was used to select text, which you can move and resize. If you select an area without including any text, the red box represents the so-called bounding box, which is used for the AI training.

If you prefer to have fine-grained control over the selection, you can also create an annotation by clicking the words you want to select one by one. This makes it possible to create annotations composed by text fragments that are not necessarily next to each other; however, editing these kind of annotations will not show the red bounding box.

Annotate PDF or Image Document

After the annotation is created you will see it on the annotations page:

Review Annotations

When you click on the annotation, you will be redirected to the document and the annotation you just created. Furthermore, you can click on the link to the Label. In the following example, we normalize any annotation of the label to be a date value. After you save the label you can preview the normalized result on the annotations Page.

View Annotation and edit label

Automated Annotations

Humans create annotations as described above. However, as soon as one extraction AI is available, annotations can be created automatically. There are two automated ways and one manual way to create annotations:

  1. Upload the document.

  2. Train an extraction AI: After an extraction AI was trained and evaluated, this extraction AI will create annotations in all documents that are assigned to the test and training dataset. This is especially helpful if you missed to annotate information in one document but did so in others.

  3. Rerun extraction on the document page. This is handy in case the document was uploaded before the extraction AI is available.

img.png

As soon the annotation is created, it will have the status Feedback required. As you will see a green tick box or red cross you will be able to provide feedback, see 1. Within one document you can use the filter to see all annotations which require feedback by humans, see 2.

img_1.png

You can also filter for all annotations in one project which require feedback on the annotation page across all documents.

img_2.png

To summarize, automated annotations are assigned to one status of three:

Feedback required

When an extraction AI created this annotation. In the API and the SDK this state if represented by revised=False and is_correct=False.

Accepted

When a human accepts a feedback required annotation. In the API and the SDK this state if represented by revised=True and is_correct=True.

Declined

When a human declines a feedback required Annotation. In the API and the SDK this state if represented by revised=True and is_correct=False.

Created by human

When a human creates a new annotation. In the API and the SDK this state if represented by revised=False and is_correct=True.

Annotation Filters

Within a document, you can use filters to select annotations you want to focus on.

  • Annotations which cannot be normalized, i.e. not-machine readable.

  • Annotations which require feedback by humans, i.e. feedback required.

  • Deduplicated annotations, i.e. top annotations.

  • Annotations which were created by the extraction AI but declined by a human during feedback, i.e. negative.