Konfuzio REST API¶
Visit API Version 2 here.
Preview Version 3¶
General¶
All list endpoints have pagination and sorting (
created_at
, asc/desc).All detail endpoints support returning only a subset of the fields.
The document endpoint currently has additional filering in the form of
created_at_before
andcreated_at_after
, this will be applied to all endpoints with a mixin over time.
Webhooks¶
Set per project with standard URL fields
Sends a POST request with JSON data to the URL
Current:
document_created
. Probably worth adding:document_deleted
,airun_complete
Send webhooks with Celery so we don’t inflate extraction time with slow POST requests
We will provide short HOWTO and Django implementation example to secure the endpoint by only allowing POST requests from Konfuzio IPs. This sample implementation could also be used to unit test the webhooks.
Performance¶
We test that the number of queries doesn’t grow with the number of related objects or with code changes: https://docs.djangoproject.com/en/4.0/topics/testing/tools/#django.test.TransactionTestCase.assertNumQueries
Django package that prevents n+1 queries: https://github.com/jmcarp/nplusone
Annotations¶
GET /api/documents/{document_id}/annotations/ (list annotations)
paginated
GET parameters to filter:
is_correct
/revised
/created_by_machine
/top_annotation
add
url
field to return the annotation’s permalink
GET /api/documents/{document_id}/annotations/{annotation_id}/ (retrieve annotation)
same as list but with a single instance
POST /api/documents/{document_id}/annotations/ (create annotations)
similar to the current smartview annotation creation endpoint
required parameters: the fields in
SequenceAnnotationSerializer
original_bboxes
should be renamed tobboxes
(read/write)document that we need EITHER start/end offset or bboxes
PUT/PATCH /api/documents/{document_id}/annotations/{annotation_id}/ (update annotation)
parameters are the same as create; they are optional in case of PATCH
should this still create a negative annotation? yes, add to the documentation that changing the label of an annotation might result in a negative copy in certain situations.
DELETE /api/documents/{document_id}/annotations/{annotation_id}/ (delete annotation)
should this still create a negative annotation? no
document that it’s probably better to send a PATCH request with revised=True and correct=False
Authentication¶
POST /api/token-auth/ (login)
should create a new token if you POST again
DELETE /api/token-auth/ (remove token)
Categories¶
GET /api/categories/ (list categories)
paginated
GET parameters to filter:
project_id
without parameters it returns all categories the user can access
otherwise same as current (in testing)
GET /api/categories/{category_id}/ (retrieve category)
same as list but with a single instance
POST /api/categories/ (create category)
required parameters:
project_id
plus the fields in the serializersame as current
PUT/PATCH /api/categories/{category_id}/ (update category)
parameters are the same as create except
project
; they are optional in case of PATCH (reference is the current admin panel)same as current
DELETE /api/categories/{category_id}/ (delete category)
same as current
Category AIs¶
GET /api/category-ais/ (list category AIs)
paginated
GET parameters to filter:
project_id
without parameters it returns all AIs the user can access
otherwise same as current (in testing)
GET /api/category-ais/{category_ai_id}/ (retrieve category AI)
same as list but with a single instance
POST /api/category-ais/ (train category AI)
required parameters:
project_id
plus the fields in the serializercurrently we have
original_project
but maybe it’s better to haveproject_id
to be consistentotherwise same as current
PUT/PATCH /api/category-ais/{category_ai_id}/ (update category AI)
parameters are the same as create except
project
; they are optional in case of PATCHsame as current
DELETE /api/category-ais/{category_id}/ (delete category AI)
same as current
Documents¶
General: normalize doc/docs -> document/documents
GET /api/documents/ (list documents)
paginated
GET parameters to filter:
project_id
without parameters it returns all documents the user can access
current fields (excluding bbox)
GET /api/documents/{document_id}/ (retrieve document)
same as list but with a single instance
for labels and groups (annotation sets), two options: either plain list but we need to duplicate information in multiple objects, or allow some nesting to show the structure of the document
the way to go: probably the nested one, but in a consistent way: merge labels and groups into a single field (
label_sets
) which looks like this:
"label_sets": [ { "id": 1, "name": "Fahrgast", "labels": [ { "id": 1, "name": "Anrede", "annotations": [ { "id": 8937385, "value": "Herr S", "correct": true, "accuracy": 0.999995502681811, "bbox": { "bottom": 286.2144, "page_index": 0, "top": 262.0152, "x0": 40.14, "x1": 546.9264, "y0": 549.7848, "y1": 573.984, "line_index": 1 }, "start_offset": 1880, "end_offset": 2175 } ] } ] } ]
we lose the label names as keys, but this allows to make the implementation much simpler, as we can just use serializers all the way:
SectionLabelSerializer
->LabelSerializer
->AnnotationSerializer
- without using custom methods with expensive queries to form the custom dict: we can just pass the correct queries to the serializers and have it figure it out. This also generates a well-formed swagger documentation with proper types and examples.
GET /api/documents/{document_id}/bbox/ (retrieve document’s bbox)
only return a document’s bbox
POST /api/documents/{document_id}/search/ (search a document)
required parameters:
query
returns a list of matching bboxes for the query
GET /api/documents/{document_id}/pages/{page_number}/ (get a document’s page)
returns entities and image URLs for a document’s page
~~POST /api/documents/ (create document)~~
~~new
Upload
model containingproject
,data_file_name
,dataset_status
,category_template
,callback_url
,sync
,extraction_url
~~~~required parameters:
project_id
~~~~no files, JSON only~~
~~returns metadata and the URL where to PUT the actual file (/api/documents/upload/{upload_id}/)~~
~~if the file is not uploaded, the
Upload
instance is deleted after x minutes (1 hour?)~~
~~PUT /api/documents/upload/{upload_id}/ (upload document)~~
~~only accepts a single binary (no JSON)~~
~~must be called after POST /api/documents/ with the specified ID that is returned~~
~~
upload_id
is going to be different than thedocument_id
so we might encode it (base64?) to avoid confusion~~~~once complete, the
Upload
instance is deleted and aDocument
instance is created~~~~returns the
DocumentSerializer
of the created instance~~
POST /api/documents/ (create document)
keep as it is now, and document with a warning that this endpoint only accepts
multipart/form-data
PUT/PATCH /api/documents/{document_id}/ (update document details)
parameters:
assignee
,data_file_name
,dataset_status
,category_template
, ?pretty much the same as current
DELETE /api/documents/{document_id}/ (delete document)
same as current
paragraph, segmentation, summarization: do we need these and how should they be changed? (skip for now)
Extraction AIs¶
General: mirrors category AIs
GET /api/extraction-ais/ (list extraction AIs)
paginated
GET parameters to filter:
project_id
without parameters it returns all AIs the user can access
otherwise same as current (in testing)
GET /api/extraction-ais/{extraction_ai_id}/ (retrieve extraction AI)
same as list but with a single instance
POST /api/extraction-ais/ (train extraction AI)
required parameters:
project_id
plus the fields in the serializercurrently we have
opriginal_category
but maybe it’s better to havecategory
to be consistentotherwise same as current
PUT/PATCH /api/extraction-ais/{extraction_ai_id}/ (update extraction AI)
parameters are the same as create; they are optional in case of PATCH
same as current
DELETE /api/extraction-ais/{extraction_id}/ (delete extraction AI)
same as current
Labels¶
GET /api/labels/ (list labels)
paginated
GET parameters to filter:
project_id
without parameters it returns all labels the user can access
otherwise same as current
GET /api/labels/{label_id}/ (retrieve label)
same as list but with a single instance
POST /api/labels/ (create label)
required parameters:
project_id
plus the fields in the serializersame as current
PUT/PATCH /api/labels/{label_id}/ (update label)
parameters are the same as create; they are optional in case of PATCH
same as current
DELETE /api/labels/{label_id}/ (delete label)
same as current
General: the sectionlabel/label relationship is shown here differently than it is in the admin, should be unified (so that you can change both from label and sectionlabel, probably)
Label Sets¶
General: label sets which are categories should be filtered out of the API
GET /api/label-sets/ (list label sets)
paginated
GET parameters to filter:
project_id
without parameters it returns all label sets the user can access
otherwise same as current (in testing)
GET /api/label-sets/{label_set_id}/ (retrieve label set)
same as list but with a single instance
POST /api/label-sets/ (create label set)
required parameters:
project_id
plus the fields in the serializersame as current
PUT/PATCH /api/label-sets/{label_set_id}/ (update label set)
parameters are the same as create except
project
; they are optional in case of PATCH (reference is the current admin panel)same as current
DELETE /api/label-sets/{label_set_id}/ (delete label set)
same as current
To discuss further with Flo about moving Category to a separate model
Projects¶
GET /api/projects/ (list projects)
paginated
returns all projects the user can access
same fields as current
GET /api/projects/{project_id}/ (retrieve project)
same as list but with a single instance
POST /api/projects/ (create project)
same as current
PUT/PATCH /api/projects/{project_id}/ (update project)
parameters are the same as create; they are optional in case of PATCH
same as current
DELETE /api/projects/{project_id}/ (delete project)
same as current
GET /api/projects/{project_id}/members/ (list project’s members)
paginated
returns
id
andemail
rationale not to have this as a separate endpoint: the members list doesn’t make sense outside the project context, unlike other models where having all instances regardless of project might be useful
POST /api/projects/{project_id}/members/ (add a project member)
required parameter:
email
creates the
User
if it doesn’t exist
DELETE /api/projects/{project_id}/members/{member_id}/ (remove a project member)
rationale for not having PUT/PATCH endpoint for members: it doesn’t make sense to edit a member’s email here as it will change its
User
email; better to DELETE and POST a new one