Supported File Types#

File Types#

Konfuzio supports the following Document types.

For information about file size and page limits, refer to the Content Limits, if you are using Konfuzio SaaS.

Name

File Extension(s)

MIME Type

Portable Document Format (PDF)

.pdf

application/pdf

Tag Image File Format (TIFF)

.tiff, .tif

image/tiff

Joint Photographic Experts Group (JPEG)

.jpg, .jpeg

image/jpeg

Portable Network Graphics (PNG)

.png

image/png

Excel

.xls, .xlsx

several, see details below

PowerPoint

.ppt, .pptx

several, see details below

Word

.doc, .docx

several, see details below

Note that some of these image formats are “lossy” (for example, JPEG). Reducing file sizes for lossy formats may result in a degradation of image quality and accuracy of results from Konfuzio.

File extension handling & correction#

It is possible to upload files with no (unknown or corrupted) file extension to Konfuzio (e.g. instead of file:example.pdf, file: example.p, example, or evenexample.example.example was uploaded) when this happens, internally correction logic is run in order to try and guess the correct extension before saving and or extracting the file/document. This correction attempts to guess all supported file types, but success cannot be guaranteed. The maximum file extension for this correction to work should not exceed 99 characters.

PDFs#

Konfuzio supports PDF/A-1a, PDF/A-1b, PDF/A-2a, PDF/A-2b, PDF/A-3a, PDF/A-3b, PDF/X-1a, PDF/1.7, PDF/2.0. An attempt will be made to repair corrupted PDFs. Konfuzio does not support AcroForms and AEM (Adobe Experience Manager) form content.

Images#

Konfuzio supports JPEG, TIFF and PNG (including support for alpha channel).

Office documents#

Konfuzio offers limited support for common office documents like Microsoft® Word (.doc, .docx), Excel (.xls, .xlsx), PowerPoint (.ppt, .pptx) and Publisher as well as the Open Document Format (ODF). Uploaded office documents are converted to PDFs by Konfuzio. Libre Office is used for the PDF conversion. The layout of the converted office document may differ from the original. Office files can not be edited after they have been uploaded.

Content limits#

The following content limits apply to Konfuzio SaaS.

Content limit

Default Value

Maximum image resolution (limit does not apply to PDF files)

megapixels not limited per page

Maximum PDF dimension (limit does not apply to images)

not limited

Maximum file size per request

not limited

Maximum number of Pages per Document (synchronous requests)

250 pages

Maximum number of Pages (batch/asynchronous requests)

250 pages

Concurrent processor version training requests

one per Category

Concurrent files processing per Project (Batch / Parallel)

not limited

Requests per minute

not limited

Maximum number of objects per API Call

1000

Synchronous requests process requests per minute

not limited

Asynchronous requests process requests per minute

not limited

Number of pages in active processing

not limited

Review document requests per minute

not limited

If you would like to increase your content limits, submit request for your project as a Support Ticket.

Document scan resolution#

For most accurate OCR results from Konfuzio, document scans should be a minimum of 200 dpi (dots per inch). 300 dpi and higher will generally produce the best results.