Service descriptions#

Preamble to the standard software#

With the Konfuzio software, information from documents is automated, simplified and retrievable at any time in a structured manner. retrievable at any time. Documents from various business processes can be integrated and consolidated. The software serves The software serves as a platform and offers different components for the simple and fast processing of individual documents of any kind as well as for the individual structuring of documents. The software serves as a platform and offers different components for the simple and fast processing of individual documents of any kind as well as for the individual structuring of the information contained therein. It is standard software that is designed for use by a large number of customers. The software must therefore be therefore have to be individually adapted and processed by the client in each case. The overall responsibility for the introduction of the software lies with the customer. The manufacturer of the software mentioned below is Helm & Nagel GmbH.

The Konfuzio software currently consists of three modules.

Konfuzio Server#

The service-oriented architecture of Konfuzio provides an AI web service for processing documents. The results of the document processing are provided via multi-client REST API services in JSON format. format. The currently documented functions of the API are available at https://app.konfuzio.com/api/. The application differentiates users according to roles and offers the possibility to configure Create, Read, Update and Delete (CRUD) permissions. In simple terms, Konfuzio processes documents in three steps:

Text recognition in scans and images through OCR#

When loading documents, documents are loaded into Konfuzio via the REST API. Depending on the incoming quality quality of the documents, technical correction procedures are used for damaged files and then OCR for full-page text recognition. full-page text recognition. The OCR engine used can be freely selected by the customer. By default, the Open Source OCR Engine Tesseract 4.1.1 is installed as standard. With use of the Tesseract OCR does not incur any further costs for the customer. Other OCR engines are purchased purchased separately. The manufacturer provides the customer with connectors in order to use the OCR engine separately for each project. separately project. A unique ID is generated for each document. Supported input formats, see Documentation are saved as archivable PDF documents (PDF/A) including embedded text layer. The originals of the uploaded file and the PDF/A generated from it can be retrieved via REST-API.

In the version of the Konfuzio Server hosted on app.konfuzio.com, Helm & Nagel GmbH uses the Azure Read API 3.2. If the customer wants to ensure the same results of the text recognition in his own installation the customer is recommended to use the OCR of Azure as an On-Prem as an `On-Prem Container or as a REST API. to be purchased. If the customer so wishes, Helm & Nagel shall provide this OCR engine and charge for it separately. separately. The price for use shall be based on Microsoft’s prices. These can be obtained for the REST API from Microsoft and for the on-prem Container from Microsoft can be viewed. The price without quantity discount applies.

Categorising and later extracting the individual pieces of information#

Categorising and extracting documents: Each incoming document is assigned to a class by a supervised learning model. class by a supervised learning model. The exact procedure for classification does not require any manual rules and goes far beyond the keyword-, phrase-, layout- or graphic-based classification. The classification gives per class per incoming document a confidence value. Classes are configured and trained by users. For each class information can be extracted for each class, if desired. The extraction of individual information in the context of tables, unstructured text or by the layout of the document is only possible by defining training documents. training documents. For this, users adapt the AI through training documents. For each training run and category automatically trained on the basis of the training documents, stored and then used for inference. inference. During inference, a confidence value is output for each piece of information. The recognition of the The recognition of the individual pieces of information is made possible by the use of labels, see below. If annotated in the training documents the context of individual pieces of information is learned by the AI. Thus, a single piece of information, e.g. first name, occur several times in the document. For example, one of the two recognised first names can be assigned to the recipient and one to the sender. the sender. The recognition of the context of the individual piece of information is made possible by the use of label sets, see next page. see next page. The manufacturer evaluates the latest AI research on an ongoing basis and includes further AI models in the product range. models into the application after positive test results.

Use of the data via REST API#

After loading and extracting the documents, the contents of the documents are made available as structured data as REST API in the JSON format. The data can be retrieved with the document ID. In addition, it is possible to webhook for each document, which actively sends the structured data to a previously defined service after the defined service after the processing process. Feedback on AI results from classification, extraction and context recognition can be be given by authorised users through the web-based SmartView. This SmartView provides direct access and synchronous display of the recognised information via document ID. Through feedback, the quality of the AI is is continuously improved. In addition, new classes, individual pieces of information or contexts can be trained to the AI in this way.

The software Konfuzio is designed to classify any type of document and to display information in the professional context on the basis of the document type. context on the basis of the document type. This generic applicability of the system is made possible by three essential elements of the software.

Category: Each incoming document is assigned to a document type, a so-called category, assigned. A document has a document type. If extraction is desired, a category can be supplemented with label sets. can be added to a category.

Labe set: A label set is a bundle of labels. A label set can be used to recognise tables or to extract individual information in the subject context. extract individual pieces of information in the subject context, e.g. that of the sender of a letter.

Label: Labels define the individual pieces of information to be extracted from a document. For each label, an auto typing, e.g. conversion into machine-readable date formats.

With these three modules, users build up a comprehensive data set that can be used to apply AI with supervised learning. methods in both classification and extraction through sample documents. Data for both initial and ongoing training are applied via the web browser-based Konfuzio SmartView per point and click in the document by users. in the document by users. Once information has been saved, it can be accessed via an individual URL in the SmartView. In addition, each individual piece of information can be accessed via a unique URL directly in the SmartView. directly in the SmartView.

The application offers extensive logging. From import to export, the technical processing steps are logging of the technical processing steps takes place for each document. This view can be viewed by users authorised as superusers. authorised as superusers. The logging at module level, e.g. the classification of a document, is accessible through the tasks in the Redis messaging system.

In addition, these three modules make it possible to build up a well-ordered database and use it for both technical and content reporting purposes. technical as well as content-related reporting purposes. Konfuzio offers standard reports and allows for the individual reports: prefabricated reports can be downloaded directly from the application for each trained AI model and project as a CSV. directly from the application. Individual reports can be created using the Konfuzio Python SDK or MS Excel Power Query.

The application is multi-client capable. An AI model can be used in different projects. Users with separately configurable roles can be invited to a project. One API endpoint is available per project. An AI model can be made available to different authenticated groups of users after training or retraining. users after training or retraining.

In addition to Konfuzio’s internal reporting options, Konfuzio is operated in a Kubernetes environment. This allows a comprehensive control of the technical operation. The continuous export of reporting-relevant data enables end-to-end reporting.

Operating requirements & system environments for installation on the customer’s servers (on-prem / (private) cloud)#

The following is a system design with 3 Konfuzio instances (DEV, TEST, PROD) in order to implement a development/test system independent of the operation in production.

The system environment comprises three types of VMs. The Konfuzio server software is run on the master VM. The speed of processing the tasks in the Redis task queue can be increased by integrating additional worker VM(s) in addition to the master VM, see also Performance under load. If text recognition (OCR) is required, at least one OCR VM must be operated per Worker VM.

Design of the Master VM#

  • Resources: 8 vCPU (min. 2.6 GHz) and 64 GB RAM

  • We recommend Redhat Linux as the VM’s operating system.

  • All VMs require the AVX2 CPU command extension.

  • PostgreSQL version 9.5 or newer is used as the database (version 11.11 is recommended).

  • Redis version 4 or newer is used as the task queue (version 4.0.9 is recommended).

  • Each VM should be connected to the network with at least 1 Gbit/s.

  • Network storage for files with at least 1 TB storage space

  • Internet connection is not required.

The technical instructions for installing the Konfuzio Server software can be found here.

Design of the Worker VM#

  • Resources: 8 vCPU (min. 2.6 GHz) and 64 GB RAM

  • We recommend Redhat Linux as the VM’s operating system.

  • All VMs need the AVX2 CPU command extension.

  • Each VM should be connected within the network with at least 1 Gbit/s

  • Read and write access to the network storage of the master VM

  • Internet connection is not required

Design of the OCR VM (optional)#

  • Resources: 16 vCPU (min. 2.6 GHz) and 64 GB RAM

  • We recommend Redhat Linux as the VM’s operating system.

  • All VMs require the AVX2 CPU command extension.

  • Each VM should be connected within the network with at least 1 Gbit/s

  • Read and write access to the network storage of the master VM

Performance under load#

A system environment with one Master VM and one Worker VM processes 3,000 pages per hour. A system environment with one Master VM and two Worker VMs processes 6,000 pages per hour. The specifications describe the state when using Tesseract 4.1.1 <https://github.com/tesseract-ocr/tesseract/releases/tag/4.1.1>`_ and assume that no training of the AI is carried out at no training of the AI is carried out at load time.

Development / test system:#

  • 1 VM for databases, data storage and web server (each for development and test) + all tasks of the development system. system. Tasks refer to tasks from the “task queue” such as preprocessing, classification, extraction and training.

  • 1 VM for OCR (development and test)

Konfuzio Trainer#

The service description can be found in the technical documentation at https://dev.konfuzio.com/training/training_documentation.html.

Konfuzio Python SDK#

The Konfuzio Python SDK provides extensible Python API that allows data scientists and developers to access and interact with the Konfuzio server. developers to access and interact with the functionality of the Konfuzio server. The Konfuzio Python SDK works independently of the chosen hosting concept of the Konfuzio Servers.

A common use case is the complete download of all data available on the Konfuzio Server to which the user has access. the user has access to. This kind of download enables a complete and self-sufficient data backup or transfer of the transfer of data to the customer’s server. With a good internet connection, thus a download speed of more than 200 MBit/s, the following storage space requirements and duration for the download can be expected:

  • The text of the document requires a storage space of approx. 0.05 MB per page at a download speed of 26,000 pages per hour

  • An additional 1 MB per page is required if the optical properties, so-called bounding boxes, are to be saved down to the individual letter. individual letter are to be saved. The download speed is approx. 16,000 pages per hour.

  • A further 0.125 MB per page is required if the archivable OCR version of the PDF is also to be saved. These files can be downloaded at approx. 48,000 pages per hour.

  • A further 0.15 MB per page is required if each page is to be saved as an image. This download is possible at a speed of 16,000 pages per hour.

Technical instructions on how to use the Python SDK can be found at dev.konfuzio.com. The service description can be found in the technical documentation at https://dev.konfuzio.com/sdk/configuration_reference.html.

Konfuzio Document Validation UI#

The Konfuzio Document Validation UI provides an extensible Vue.js-based user interface. It is specifically designed for business clients to intuitively verify the accuracy of information automatically extracted by our document extraction AI, with minimal time investment.

The Document Validation UI offers users clear guidance for reviewing extracted information. A typical use case involves a staff member validating a document extracted by Konfuzio. The following actions are available to the user:

  • Accept the extracted information to confirm its accuracy.

  • Edit the content of the extracted information.

  • Remove the extracted information if it is unwanted.

  • Add a new annotation for any missing extracted information.

  • Complete the review to track completed validations.

Additionally, the Document Validation UI provides editing tools that enable users to modify documents in the following ways:

  • Rename the file.

  • Manually change the document category.

  • Split the document into multiple files.

  • Rearrange the pages contained in the document.

  • Rotate individual pages or all pages in the document.

In addition to its existing integration within the Konfuzio Server, the Document Validation UI can be integrated into third-party applications. Technical guidance for integrating the Document Validation UI is available at dev.konfuzio.com.

The performance description can be found in the documentation at <https://help.konfuzio.com/document-validation-ui/index.html>. If the Document Validation UI is not included in the maintenance contract, the MIT License applies. This can be reviewed on GitHub <https://github.com/konfuzio-ai/konfuzio-capture-vue/blob/main/LICENSE>.