System Requirements#
Operating requirements & system environments for installation on site#
The system environment includes three types of VMs. The Konfuzio Server software is run on the master VM. The speed of processing the tasks in the Redis task queue can be extended by including additional worker VM(s) in addition to the master VM, see also Performance under load. If text recognition (OCR) is required, at least one OCR VM must be operated per Worker VM.
Design of the Master VM#
Resources: 8 vCPU (min. 2.6 GHz) and 64 GB RAM
We recommend Redhat Linux as the operating system for the VM.
All VMs require the AVX2 CPU command extension.
PostgreSQL version 10 or newer is used as database (current stable version is recommended).
Redis version 5 or newer is used as the task queue (current stable version is recommended).
Each VM should be connected within the network with at least 1 Gbit/s
Network storage for files with at least 1 TB storage space
Internet connection is not required.
Technical instructions for installing Konfuzio Server software can be found here.
Design of the Worker VM#
Resources: 8 vCPU (min. 2.6 GHz) and 64 GB RAM.
We recommend Redhat Linux as the operating system for the VM.
All VMs need the AVX2 CPU command extension.
Each VM should be connected within the network with at least 1 Gbit/s
Read and write access to the network storage of the master VM
Internet connection is not required
Design of the OCR VM (optional)#
Resources: 8 vCPU (min. 2.6 GHz) and 64 GB RAM
We recommend Redhat Linux as the VM’s operating system.
All VMs require the AVX2 CPU command extension.
Each VM should be connected within the network with at least 1 Gbit/s
Read and write access to the network storage of the master VM
The use of Tesseract 4.1.1 does not require an internet connection
Using On-Prem Container requires an internet connection approximately every 100 minutes to report the number of pages processed to Microsoft. However, no other data is transferred during this process. More details can be found in the Documentation.
Performance under load#
A system environments with one Master VM and one Worker VM process 3,000 pages per hour. A system environments with one Master VM and two Worker VMs process 6,000 pages per hour. The data describes the state when using Tesseract 4.1.1 <https://github.com/tesseract-ocr/tesseract/releases/tag/4.1.1>`_ and provide for no training of the no training of the AI is performed at load time.
Development / test system#
The following is the design of development or staging servers to enable a development/test system independent of operation in production.
1 VM for databases, data storage and Konfuzio server (each for development and test) + all tasks of the development system. system. Tasks refer to tasks from the “task queue” such as preprocessing, classification, extraction and training.
1 VM for OCR (development and test)