Splitting AI#

Feature currently available on request only

Overview#

Splitting AI by Konfuzio is a Project-centric feature aimed at segmenting multipage stream-of-pages into independent Documents. Splitting AI automates this process without the need for manual segmentation.

Activation of Splitting AI#

Konfuzio: Contact support at app.konfuzio.com to enable this feature.
Self-hosted installations: Navigate to your Project settings and toggle the Splitting AI feature. By default, this feature is disabled.

Configuration Image

Splitting modes#

Splitting Modes Image

Manual confirmation:
- Users are required to approve each Page split before segmentation.
- Learn more about manual confirmation.
Automatic confirmation:
- Documents are segmented automatically based on the trained AI and uploaded directly.

Splitting AI types#

Textual (default)
- Splitting AI that is based on text processing and designed for a broader range of Documents and Projects. Currently the best solution for optimal accuracy-speed balance.
Context Aware:
- Tailored for homogeneous document collections, emphasizing swift processing. NB: this AI might not work well with small datasets.
Multimodal:
- An AI that takes images and texts of Documents into account, working well with various Document types and average-sized datasets, but with reduced speed compared to the Textual Splitting AI.

Training Splitting AI#

Initiate the training by uploading representative independent Documents and divide these Documents into Training and Test datasets.

Post-upload processing#

With a trained Splitting AI, upon uploading a Document, it’s automatically segmented. For instance, a 10-page legal Document with signatures on pages 3, 5, and 10 would be segmented into four Documents:

Document 1: Pages 1-2
Document 2: Page 3 (Signature)
Document 3: Pages 4-9
Document 4: Page 10 (Signature)

Original Document handling:

Post-segmentation, the original Document is deleted, with the new Documents added to the Project.

Billing of Splitting AI uploads#

When a Page is used as a splitting point, the upload count does not increase. Meaning that when the initial upload was a 10-page Document, and your upload count is now 10, this Document is split into 4 separate Documents and your upload count will remain at 10.

Project-level configuration#

Each Project allows for training of individual Splitting AIs. Ensure the “Enable splitting” option is selected in the Project settings to work with Splitting AIs. Users can activate one Splitting AI per Project, which then analyzes newly uploaded multipage Documents to identify potential multiple Documents. Users can confirm or refine these suggestions in the Document Validation UI.

Note: SmartView doesn’t support Document splitting, hence the Document Validation UI will be enabled. You can revert to SmartView post review.

For a comprehensive understanding of how Splitting AIs collaborate with Categorization AIs and Extraction AIs, refer to our architecture diagram.

Splitting AI details#

Project#

The Project associated with the training of the Splitting AI.

Status#

“Queuing for training…”: Awaiting training initiation.
“Data loading in progress…”: Training has begun, and data loading is underway.
“AI training in progress…”: Training data is loaded, and training is ongoing.
“AI evaluation in progress…”: Post-training, evaluation is in progress.
“Training finished.”: Evaluation complete, ready for use.
“Contact support”: In case of training failure.

Description#

A brief on the rationale for training.

Version#

Version increments post each training.

Created at#

Training initiation timestamp.

Loading time (in seconds)#

Showcases the average, minimum, and maximum loading time across all AI runs.

Runtime (in seconds)#

Displays the average, minimum, and maximum runtime across all AI runs, summing loading and runtime gives the total processing time.

Training process#

The training setup is fully automated, requiring only a brief description from the users. This description helps in understanding the intent behind any Project changes impacting the Splitting AI quality.

For training a Splitting AI, ensure Documents from at least two different Categories are present in the training set.

Retraining Splitting AI#

Upon uploading new Documents to your Project, retraining is possible:

Add new Documents to Training Documents.
Initiate Training.
With the same Test Documents but a larger Training Documents set, AI quality should enhance.

Splitting AI operations#

Evaluating Splitting AIs#

Alterations to the test dataset allow for evaluation of older Splitting AI models against the current dataset.

Activating Splitting AI#

Choose the active Splitting AI for the Project, a useful feature for switching between different trained AIs.