Documents#

A document refers to a file uploaded to Konfuzio Server, see supported file types.

Upload new documents#

Drag and drop or browse your local files and wait until the link to the document shows up.

Check the supported file types and languages.

You can also upload new documents via our REST API or by email.

Delete document#

Select document(s) you want to delete. Select “Delete documents” and press GO to verify your action on the next page.

Document Details#

Click on a document and then click on details.

img_2.png

You have diverse options to change and inspect the document:

img_4.png

File name#

Name of the archivable PDF file.

Dataset status#

See Dataset status

Assignee#

User assigned to work on this document.

Category#

See Categories

PDF file#

Link to the archivable PDF file.

Original file#

Link to the file originally uploaded. See Supported File Types

Original file producer#

Software used to create the original file.

Number of pages#

Number of pages of the original file.

Uploaded by#

User who uploaded the document.

Created at#

The time the Document was created.

Callback URL#

Webhook in case REST-API is used. Incoming requests from the IP ranges of our Provicer need to be accepted. Please refer to “EIP-pool” IP ranges here

Callback status code#

Status of the receiving service after webhook was triggered.

Fonts#

In case the document is a PDF file, this displays if all fonts are embedded in the PDF. If bot all fonts are embedde, your Document may not display correctly.

img_7.png

Dataset Status#

img_3.png

Status: Set a status#

The default status of a document.

Status: Preparation#

Add any document you want to add to the test and training set before you start labeling it. This step is crucial to check if the document you want to add is already in the training or test data.

Status: Training documents#

The training documents provide the data to train extraction AIs of a category.

Status: Test documents#

The test documents provide the data to evaluate extraction AIs of a category.

Status: Excluded#

If you find out that a preparation document should not be used for testing or training due to its quality, exclude the document.

Document views#

SmartView#

The Smartview is the default document view.

SmartView

Have a look at Annotations to find out more.

TextView#

The Textview allows you to edit Annotations via raw text.

img_6.png

Dashboard#

Preview the result of the REST-API of the active extraction AI. You will see the extraction AI version in the first line of the right table.

img_5.png

Advanced Filters#

Additional to our regular document filters:

  • By Category

  • By Dataset status

  • By Human feedback required

  • By 100% machine-readable

We also provide advanced filtering on Document level

By Extraction Ai:#

img_10.png

If Documents in your Project have been evaluated by an Extraction AI, this filter can be selected. The dropdown of the filter will give you a list of the Ai’s and Versions the Documents have been evaluated by. The number within the brackets in the filter corresponds to the amount of available documents by this extraction Ai.

Once the initial Ai filter has been selected, a secondary filter will be visible which is called “By 10 % F1-Score”.

This filter offers two options, firstly “Top documents”, with which you can see the top documents for the selected Ai. And secondly the “Documents need improvements” option, which lets you select the worst performing documents by “ F1-score” for the selected Ai. The filtering only is available for current evaluations, meaning that as soon you alter annotations on the Document, it will no longer be available in the Ai or in the “By 10% F1-Score” filter. At this point, a new Ai evaluation for the category would have to be made.

Rounding:#

Please note that the results in this filter are rounded up. If there are less than 10 documents in the Ai’s evaluation, then we return the top-1 and worst-1 document. Between 10 and 100 results, we round up to the nearest decimal, meaning that if there are 16/26/36/… documents in the Ai’s evaluation, we return 2/3/4/… results for the F1-score filter. “True” 10% results are only returned for Ai’s which have more than 100 documents available in the Ai’s evaluation.