Image labeling

With ML Kit's image labeling APIs you can detect and extract information about entities in an image across a broad group of categories. The default image labeling model can identify general objects, places, activities, animal species, products, and more.

You can also use a custom image classification model to tailor detection to a specific use case. See Using a custom TensorFlow Lite model for more information.

Key capabilities

A powerful general-purpose base classifier Recognizes more than 400 categories that describe the most commonly found objects in photos.
Tailor to your use case with custom models Use other pre-trained models from TensorFlow Hub or your own custom model trained with TensorFlow, AutoML Vision Edge or TensorFlow Lite Model maker.
Easy-to-use high-level APIs No need to deal with low-level model input/output, image pre- and post-processing, or building a processing pipeline. ML Kit extracts the labels from the TensorFlow Lite model and provides them as a text description.

Note that this API is intended for image classification models that describe the full image. For classifying one or more objects in an image, such as shoes or pieces of furniture, the Object Detection & Tracking API may be a better fit.

Supported image classification models

The Image Labeling APIs support different image classification models:

Supported image classification models
Base model	By default the API uses a powerful general-purpose image labeling model that recognizes more than 400 entities that cover the most commonly-found concepts in photos.
Custom TensorFlow Lite models	To target application-specific concepts, the API accepts custom image classification models from a wide range of sources. These can be pre-trained models downloaded from TensorFlow Hub or your own models trained with AutoML Vision Edge, TensorFlow Lite Model Maker or TensorFlow itself. Models can be bundled with your app or hosted with Firebase Machine Learning and downloaded at run-time.

Using the base model

ML Kit’s base model returns a list of entities that identify people, things, places, activities, and so on. Each entity comes with a score that indicates the confidence the ML model has in its relevance. With this information, you can perform tasks such as automatic metadata generation and content moderation. The default model provided with ML Kit recognizes more than 400 different entities.

iOS Android

Example labels

The base model in the image labeling API supports 400+ labels, such as the following examples:

Category	Example labels
People	`Crowd` `Selfie` `Smile`
Activities	`Dancing` `Eating` `Surfing`
Things	`Car` `Piano` `Receipt`
Animals	`Bird` `Cat` `Dog`
Plants	`Flower` `Fruit` `Vegetable`
Places	`Beach` `Lake` `Mountain`

Example results

Here is an example of the entities that were recognized in the accompanying photo.

Photo: Clément Bucco-Lechat / Wikimedia Commons / CC BY-SA 3.0

Label 0
Text	Stadium
Confidence	0.9205354
Label 1
Text	Sports
Confidence	0.7531109
Label 2
Text	Event
Confidence	0.66905296
Label 3
Text	Leisure
Confidence	0.59904146
Label 4
Text	Soccer
Confidence	0.56384534
Label 5
Text	Net
Confidence	0.54679185
Label 6
Text	Plant
Confidence	0.524364

Using a custom TensorFlow Lite model

ML Kit's base image labeling model is built for general-purpose use. It's trained to recognize 400 categories that describe the most commonly-found objects in photos. Your app might need a specialized image classification model that recognizes a narrower number of categories in more detail, such as a model that distinguishes between species of flowers or types of food.

This API lets you tailor to a particular use case by supporting custom image classification models from a wide range of sources. Please refer to Custom models with ML Kit to learn more. Custom models can be bundled with your app or dynamically downloaded from the cloud using Firebase Machine Learning's Model deployment service.

iOS Android

Input image preprocessing

If needed, Image Labeling uses bilinear image scaling and stretching to adjust the input image size and aspect ratio so that they fit the requirements of the underlying model.