Documents#

A document refers to a file uploaded to Konfuzio Server, see supported file types.

Upload new documents#

Drag and drop or browse your local files and wait until the link to the document shows up.

Check the supported file types and languages.

You can also upload new documents via our REST API or by email.

More detailed information about uploading new documents

Delete document#

Select document(s) you want to delete. Select “Delete documents” and press GO to verify your action on the next page.

Bulk Edit#

The bulk edit option allows to edit multiple Documents at once. Select the Documents, then choose “bulk edit”. Select the fields you want to edit and chose the desired value.

bulk_edit.gif

Document Details#

Click on a document and then click on details.

img_2.png

You have diverse options to change and inspect the document:

img_document_details.png

File name#

Name of the archivable PDF file.

Dataset Status#

See

Assignee#

User assigned to work on this document.

Category#

See Categories

PDF file#

Link to the archivable PDF file.

Original file#

Link to the file originally uploaded. See Supported File Types

Original file producer#

Software used to create the original file.

Number of pages#

Number of pages of the original file.

Uploaded by#

User who uploaded the document.

Created at#

The time the Document was created.

Callback URL#

Webhook in case REST-API is used. Incoming requests from the IP ranges of our Provider need to be accepted. Please refer to “EIP-pool” IP ranges here

Callback status code#

Status of the receiving service after webhook was triggered.

Unfilled Labels#

Labels (in the context of label sets) that don’t have an accepted annotation.

Is Reviewed#

Indicates if the Document has been completely reviewed by a human.

Is Public#

Public documents have their own public URL and are accessible from outside Konfuzio (e.g. via Konfuzio Capture Vue). Public Documents will be excluded from the Document List and cannot be accessed in the SmartView.

Fonts#

In case the document is a PDF file, this displays if all fonts are embedded in the PDF. If all fonts are embedded, your Document may not be displayed correctly.

img_7.png

Extraction Logs#

The log of the Extraction AI Run. The log contains useful information for debugging purposes. If you have uploaded a custom model, any log messages created in the extract method will be displayed here.

extraction_logs.png

For example, for the custom Paragraph extraction AI, the following log might be displayed:

paragraph_extraction_logs.png

Categorization Logs#

The log of the Categorization AI Run.

categorization_logs.png

Document Workflow#

This illustrates the workflow of the Document. The shown workflow is generated based on the current settings of the Project. It might differ from the workflow that had actually happened. A workflow consists of multiple background processes. The workflow graph helps to gain an understanding for the processing time of a Document and to finetune self-hosted Konfuzio Server installations.

workflow.svg

Dataset Status#

img_3.png

Status: Set a status (None)#

The default status of a document.

Status: Preparation#

Add any document you want to add to the test and training set before you start labeling it. This step is crucial to check if the document you want to add is already in the training or test data.

Status: Training documents#

The training documents provide the data to train extraction AIs of a category.

Status: Test documents#

The test documents provide the data to evaluate extraction AIs of a category.

Status: Excluded#

If you find out that a preparation document should not be used for testing or training due to its quality, exclude the document.

Document views#

SmartView#

The Smartview is the default document view.

SmartView

Have a look at Annotations to find out more.

TextView#

The Textview allows you to edit Annotations via raw text.

img_6.png

Dashboard#

Preview the result of the REST-API of the active extraction AI. You will see the extraction AI version in the first line of the right table.

img_5.png

Advanced Filters#

Additional to our regular document filters:

  • By Category

  • By Dataset status

  • Human feedback required

  • By 100% machine-readable

We also provide advanced filtering on Document level

By Extraction Ai:#

img_10.png

If Documents in your Project have been evaluated by an Extraction AI, this filter can be selected. The dropdown of the filter will give you a list of the AI’s and Versions the Documents have been evaluated by. The number within the brackets in the filter corresponds to the amount of available documents by this extraction AI.

Once the initial AI filter has been selected, a secondary filter will be visible which is called “By 10 % F1-Score”.

This filter offers two options, firstly “Top documents”, with which you can see the top documents for the selected AI. And secondly the “Documents need improvements” option, which lets you select the worst performing documents by “ F1-score” for the selected AI. The filtering is only available for current evaluations, meaning that as soon you alter annotations on the Document, it will no longer be available in the AI or in the “By 10% F1-Score” filter. At this point, a new AI evaluation for the category would have to be made.

Rounding:#

Please note that the results in this filter are rounded up. If there are less than 10 documents in the AI’s evaluation, then we return the top-1 and worst-1 document. Between 10 and 100 results, we round up to the nearest decimal, meaning that if there are 16/26/36/… documents in the AI’s evaluation, we return 2/3/4/… results for the F1-score filter. “True” 10% results are only returned for AI’s which have more than 100 documents available in the AI’s evaluation.