What is Autodistill?

Autodistill is a free and open-source AI project that uses large, slower foundation models to train smaller, faster supervised models, allowing for inference on unlabeled images with no human intervention.

It can be used on personal hardware or via Roboflow's hosted version for cloud-based image labeling. Currently, it supports vision tasks such as object detection and instance segmentation, with potential for future expansion to language models.

26 Free Open-source Image Annotation Tools
Image annotation and labeling involves adding metadata to images, such as tags or notes, to provide additional context or meaning. This process is crucial in various fields, particularly in machine learning and artificial intelligence (AI), where it helps in training models to recognize and interpret images accurately. The use-cases for

Features

  • Pluggable interface to connect models together
  • Automatic dataset labeling
  • Training of fast supervised models
  • Ownership of your model
  • Deployment of distilled models to the cloud or the edge
  • Built-in object detection
  • Dozens of models
  • Instance segmentation models
  • Classification models

Video

Basic Concepts

To use autodistill, you input unlabeled data into a Base Model which uses an Ontology to label a Dataset that is used to train a Target Model which outputs a Distilled Model fine-tuned to perform a specific Task.

Autodistill defines several basic primitives:

  • Task - A Task defines what a Target Model will predict. The Task for each component (Base Model, Ontology, and Target Model) of an autodistill pipeline must match for them to be compatible with each other. Object Detection and Instance Segmentation are currently supported through the detection task. classification support will be added soon.
  • Base Model - A Base Model is a large foundation model that knows a lot about a lot. Base models are often multimodal and can perform many tasks. They're large, slow, and expensive. Examples of Base Models are GroundedSAM and GPT-4's upcoming multimodal variant. We use a Base Model (along with unlabeled input data and an Ontology) to create a Dataset.
  • Ontology - an Ontology defines how your Base Model is prompted, what your Dataset will describe, and what your Target Model will predict. A simple Ontology is the CaptionOntology which prompts a Base Model with text captions and maps them to class names. Other Ontologies may, for instance, use a CLIP vector or example images instead of a text caption.
  • Dataset - a Dataset is a set of auto-labeled data that can be used to train a Target Model. It is the output generated by a Base Model.
  • Target Model - a Target Model is a supervised model that consumes a Dataset and outputs a distilled model that is ready for deployment. Target Models are usually small, fast, and fine-tuned to perform a specific task very well (but they don't generalize well beyond the information described in their Dataset). Examples of Target Models are YOLOv8 and DETR.
  • Distilled Model - a Distilled Model is the final output of the autodistill process; it's a set of weights fine-tuned for your task that can be deployed to get predictions.

License

The autodistill package is licensed under an Apache 2.0. Each Base or Target model plugin may use its own license corresponding with the license of its underlying model.

Resources & Downloads

GitHub - autodistill/autodistill: Images to inference with no labeling (use foundation models to train supervised models).
Images to inference with no labeling (use foundation models to train supervised models). - autodistill/autodistill
26 Free Open-source Image Annotation Tools
Image annotation and labeling involves adding metadata to images, such as tags or notes, to provide additional context or meaning. This process is crucial in various fields, particularly in machine learning and artificial intelligence (AI), where it helps in training models to recognize and interpret images accurately. The use-cases for