Supported File Types#
File Types#
Konfuzio supports the following Document types.
For information about file size and page limits, refer to the Content Limits, if you are using Konfuzio SaaS.
Name |
File Extension(s) |
|
---|---|---|
Portable Document Format (PDF) |
|
|
Tag Image File Format (TIFF) |
|
|
Joint Photographic Experts Group (JPEG) |
|
|
Portable Network Graphics (PNG) |
|
|
Excel |
|
several, see details below |
PowerPoint |
|
several, see details below |
Word |
|
several, see details below |
Note that some of these image formats are “lossy” (for example, JPEG). Reducing file sizes for lossy formats may result in a degradation of image quality and accuracy of results from Konfuzio.
File extension handling & correction#
It is possible to upload files with no (unknown or corrupted) file extension to Konfuzio (e.g. instead of
file:example.pdf
, file: example.p
, example
, or evenexample.example.example
was uploaded) when this happens,
internally correction logic is run in order to try and guess the correct extension before saving and or extracting
the file/document. This correction attempts to guess all supported file types, but success cannot be guaranteed. The
maximum file extension for this correction to work should not exceed 99 characters.
PDFs#
Konfuzio supports PDF/A-1a, PDF/A-1b, PDF/A-2a, PDF/A-2b, PDF/A-3a, PDF/A-3b, PDF/X-1a, PDF/1.7, PDF/2.0. An attempt will be made to repair corrupted PDFs. Konfuzio does not support AcroForms and AEM (Adobe Experience Manager) form content.
Images#
Konfuzio supports JPEG, TIFF and PNG (including support for alpha channel).
Office documents#
Konfuzio offers limited support for common office documents like Microsoft® Word (.doc, .docx), Excel (.xls, .xlsx), PowerPoint (.ppt, .pptx) and Publisher as well as the Open Document Format (ODF). Uploaded office documents are converted to PDFs by Konfuzio. Libre Office is used for the PDF conversion. The layout of the converted office document may differ from the original. Office files can not be edited after they have been uploaded.
Content limits#
The following content limits apply to Konfuzio SaaS.
Content limit |
Default Value |
---|---|
Maximum image resolution (limit does not apply to PDF files) |
megapixels not limited per page |
Maximum PDF dimension (limit does not apply to images) |
not limited |
Maximum file size per request |
not limited |
Maximum number of Pages per Document (synchronous requests) |
250 pages |
Maximum number of Pages (batch/asynchronous requests) |
250 pages |
Concurrent processor version training requests |
one per Category |
Concurrent files processing per Project (Batch / Parallel) |
not limited |
Requests per minute |
not limited |
Maximum number of objects per API Call |
1000 |
Synchronous requests process requests per minute |
not limited |
Asynchronous requests process requests per minute |
not limited |
Number of pages in active processing |
not limited |
Review document requests per minute |
not limited |
If you would like to increase your content limits, submit request for your project as a Support Ticket.
Document scan resolution#
For most accurate OCR results from Konfuzio, document scans should be a minimum of 200 dpi (dots per inch). 300 dpi and higher will generally produce the best results.