Documents#
A document refers to a file uploaded to Konfuzio Server, see supported file types.
Upload new documents#
Drag and drop or browse your local files and wait until the link to the document shows up.
Check the supported file types and languages.
You can also upload new documents via our REST API or by email.
More detailed information about uploading new documents
Delete document#
Select document(s) you want to delete. Select “Delete documents” and press GO to verify your action on the next page.
Bulk Edit#
The bulk edit option allows to edit multiple Documents at once. Select the Documents, then choose “bulk edit”. Select the fields you want to edit and chose the desired value.
Document Details#
Click on a document and then click on details.
You have diverse options to change and inspect the document:
File name#
Name of the archivable PDF file.
Dataset Status#
See
Assignee#
User assigned to work on this document.
Category#
See Categories
Links#
See Document views
PDF file#
Link to the archivable PDF file.
Original file#
Link to the file originally uploaded. See Supported File Types
Original file producer#
Software used to create the original file.
Number of pages#
Number of pages of the original file.
Uploaded by#
User who uploaded the document.
Created at#
The time the Document was created.
Callback URL#
Webhook in case REST-API is used. Incoming requests from the IP ranges of our Provider need to be accepted. Please refer to “EIP-pool” IP ranges here
Callback status code#
Status of the receiving service after webhook was triggered.
Unfilled Labels#
Labels (in the context of label sets) that don’t have an accepted annotation.
Is Reviewed#
Indicates if the Document has been completely reviewed by a human.
Is Public#
Public documents have their own public URL and are accessible from outside Konfuzio (e.g. via Konfuzio Capture Vue). Public Documents will be excluded from the Document List and cannot be accessed in the SmartView.
Fonts#
In case the document is a PDF file, this displays if all fonts are embedded in the PDF. If all fonts are embedded, your Document may not be displayed correctly.
Extraction Logs#
The log of the Extraction AI Run. The log contains useful information for debugging purposes. If you have uploaded a custom model, any log messages created in the extract
method will be displayed here.
For example, for the custom Paragraph extraction AI, the following log might be displayed:
Categorization Logs#
The log of the Categorization AI Run.
Document Workflow#
This illustrates the workflow of the Document. The shown workflow is generated based on the current settings of the Project. It might differ from the workflow that had actually happened. A workflow consists of multiple background processes. The workflow graph helps to gain an understanding for the processing time of a Document and to finetune self-hosted Konfuzio Server installations.
Dataset Status#
Status: Set a status (None)#
The default status of a document.
Status: Preparation#
Add any document you want to add to the test and training set before you start labeling it. This step is crucial to check if the document you want to add is already in the training or test data.
Status: Training documents#
The training documents provide the data to train extraction AIs of a category.
Status: Test documents#
The test documents provide the data to evaluate extraction AIs of a category.
Status: Excluded#
If you find out that a preparation document should not be used for testing or training due to its quality, exclude the document.
Document views#
SmartView#
The Smartview is the default document view.
Have a look at Annotations to find out more.
TextView#
The Textview allows you to edit Annotations via raw text.
Dashboard#
Preview the result of the REST-API of the active extraction AI. You will see the extraction AI version in the first line of the right table.
Advanced Filters#
Additional to our regular document filters:
By Category
By Dataset status
Human feedback required
By 100% machine-readable
We also provide advanced filtering on Document level
By Extraction Ai:#
If Documents in your Project have been evaluated by an Extraction AI, this filter can be selected. The dropdown of the filter will give you a list of the AI’s and Versions the Documents have been evaluated by. The number within the brackets in the filter corresponds to the amount of available documents by this extraction AI.
Once the initial AI filter has been selected, a secondary filter will be visible which is called “By 10 % F1-Score”.
This filter offers two options, firstly “Top documents”, with which you can see the top documents for the selected AI. And secondly the “Documents need improvements” option, which lets you select the worst performing documents by “ F1-score” for the selected AI. The filtering is only available for current evaluations, meaning that as soon you alter annotations on the Document, it will no longer be available in the AI or in the “By 10% F1-Score” filter. At this point, a new AI evaluation for the category would have to be made.
Rounding:#
Please note that the results in this filter are rounded up. If there are less than 10 documents in the AI’s evaluation, then we return the top-1 and worst-1 document. Between 10 and 100 results, we round up to the nearest decimal, meaning that if there are 16/26/36/… documents in the AI’s evaluation, we return 2/3/4/… results for the F1-score filter. “True” 10% results are only returned for AI’s which have more than 100 documents available in the AI’s evaluation.