With ML Kit's image labeling APIs you can detect and extract information about entities in an image across a broad group of categories. The default image labeling model can identify general objects, places, activities, animal species, products, and more.
You can also use a custom image classification model to tailor detection to a specific use case. See Using a custom TensorFlow Lite model for more information.
Key capabilities
- A powerful general-purpose base classifier Recognizes more than 400 categories that describe the most commonly found objects in photos.
- Tailor to your use case with custom models Use other pre-trained models from TensorFlow Hub or your own custom model trained with TensorFlow, AutoML Vision Edge or TensorFlow Lite Model maker.
- Easy-to-use high-level APIs No need to deal with low-level model input/output, image pre- and post-processing, or building a processing pipeline. ML Kit extracts the labels from the TensorFlow Lite model and provides them as a text description.
Note that this API is intended for image classification models that describe the full image. For classifying one or more objects in an image, such as shoes or pieces of furniture, the Object Detection & Tracking API may be a better fit.
Supported image classification models
The Image Labeling APIs support different image classification models:
Supported image classification models | |
---|---|
Base model | By default the API uses a powerful general-purpose image labeling model that recognizes more than 400 entities that cover the most commonly-found concepts in photos. |
Custom TensorFlow Lite models | To target application-specific concepts, the API accepts custom image classification models from a wide range of sources. These can be pre-trained models downloaded from TensorFlow Hub or your own models trained with AutoML Vision Edge, TensorFlow Lite Model Maker or TensorFlow itself. Models can be bundled with your app or hosted with Firebase Machine Learning and downloaded at run-time. |
Using the base model
ML Kit’s base model returns a list of entities that identify people, things, places, activities, and so on. Each entity comes with a score that indicates the confidence the ML model has in its relevance. With this information, you can perform tasks such as automatic metadata generation and content moderation. The default model provided with ML Kit recognizes more than 400 different entities.
Example labels
The base model in the image labeling API supports 400+ labels, such as the following examples:
Category | Example labels |
---|---|
People | Crowd Selfie Smile |
Activities | Dancing Eating Surfing |
Things | Car Piano Receipt |
Animals | Bird Cat Dog |
Plants | Flower Fruit Vegetable |
Places | Beach Lake Mountain |
Example results
Here is an example of the entities that were recognized in the accompanying photo.
Label 0 | |
---|---|
Text | Stadium |
Confidence | 0.9205354 |
Label 1 | |
Text | Sports |
Confidence | 0.7531109 |
Label 2 | |
Text | Event |
Confidence | 0.66905296 |
Label 3 | |
Text | Leisure |
Confidence | 0.59904146 |
Label 4 | |
Text | Soccer |
Confidence | 0.56384534 |
Label 5 | |
Text | Net |
Confidence | 0.54679185 |
Label 6 | |
Text | Plant |
Confidence | 0.524364 |
Using a custom TensorFlow Lite model
ML Kit's base image labeling model is built for general-purpose use. It's trained to recognize 400 categories that describe the most commonly-found objects in photos. Your app might need a specialized image classification model that recognizes a narrower number of categories in more detail, such as a model that distinguishes between species of flowers or types of food.
This API lets you tailor to a particular use case by supporting custom image classification models from a wide range of sources. Please refer to Custom models with ML Kit to learn more. Custom models can be bundled with your app or dynamically downloaded from the cloud using Firebase Machine Learning's Model deployment service.
Input image preprocessing
If needed, Image Labeling uses bilinear image scaling and stretching to adjust the input image size and aspect ratio so that they fit the requirements of the underlying model.