A document refers to a file uploaded to Konfuzio Server, see supported file types.
Upload new documents#
Drag and drop or browse your local files and wait until the link to the document shows up.
You can also upload new documents via our REST API or by email.
Select document(s) you want to delete. Select “Delete documents” and press GO to verify your action on the next page.
Click on a document and then click on details.
You have diverse options to change and inspect the document:
Name of the archivable PDF file.
See Dataset status
User assigned to work on this document.
Link to the archivable PDF file.
Link to the file originally uploaded. See Supported File Types
Original file producer#
Software used to create the original file.
Number of pages#
Number of pages of the original file.
User who uploaded the document.
The time the Document was created.
Webhook in case REST-API is used. Incoming requests from the IP ranges of our Provicer need to be accepted. Please refer to “EIP-pool” IP ranges here
Callback status code#
Status of the receiving service after webhook was triggered.
In case the document is a PDF file, this displays if all fonts are embedded in the PDF. If bot all fonts are embedde, your Document may not display correctly.
Status: Set a status#
The default status of a document.
Add any document you want to add to the test and training set before you start labeling it. This step is crucial to check if the document you want to add is already in the training or test data.
Status: Training documents#
The training documents provide the data to train extraction AIs of a category.
Status: Test documents#
The test documents provide the data to evaluate extraction AIs of a category.
If you find out that a preparation document should not be used for testing or training due to its quality, exclude the document.
The Smartview is the default document view.
Have a look at Annotations to find out more.
The Textview allows you to edit Annotations via raw text.
Preview the result of the REST-API of the active extraction AI. You will see the extraction AI version in the first line of the right table.
Additional to our regular document filters:
By Dataset status
By Human feedback required
By 100% machine-readable
We also provide advanced filtering on Document level
By Extraction Ai:#
If Documents in your Project have been evaluated by an Extraction AI, this filter can be selected. The dropdown of the filter will give you a list of the Ai’s and Versions the Documents have been evaluated by. The number within the brackets in the filter corresponds to the amount of available documents by this extraction Ai.
Once the initial Ai filter has been selected, a secondary filter will be visible which is called “By 10 % F1-Score”.
This filter offers two options, firstly “Top documents”, with which you can see the top documents for the selected Ai. And secondly the “Documents need improvements” option, which lets you select the worst performing documents by “ F1-score” for the selected Ai. The filtering only is available for current evaluations, meaning that as soon you alter annotations on the Document, it will no longer be available in the Ai or in the “By 10% F1-Score” filter. At this point, a new Ai evaluation for the category would have to be made.
Please note that the results in this filter are rounded up. If there are less than 10 documents in the Ai’s evaluation, then we return the top-1 and worst-1 document. Between 10 and 100 results, we round up to the nearest decimal, meaning that if there are 16/26/36/… documents in the Ai’s evaluation, we return 2/3/4/… results for the F1-score filter. “True” 10% results are only returned for Ai’s which have more than 100 documents available in the Ai’s evaluation.