Page Summary
-
ML Kit can detect and track up to 5 objects in images and videos, using either bundled or custom models.
-
Custom models can be bundled within the app or hosted on Firebase, each with its own advantages and limitations.
-
Object detection involves configuring the detector, preparing the input image, processing the image, and handling the detected object results.
-
Developers can optimize performance by carefully configuring detection mode, object count, and classification settings.
-
ML Kit is compatible with 64-bit iOS devices and offers specific considerations for AutoML Vision Edge model migration.
When you pass an image to ML Kit, it detects up to five objects in the image along with the position of each object in the image. When detecting objects in video streams, each object has a unique ID that you can use to track the object from frame to frame.
You can use a custom image classification model to classify the objects that are detected. Refer to Custom models with ML Kit for guidance on model compatibility requirements, where to find pre-trained models, and how to train your own models.
There are two ways to integrate a custom model. You can bundle the model by putting it inside your app’s asset folder, or you can dynamically download it from Cloud Storage. The following table compares the two options.
| Bundled Model | Hosted Model |
|---|---|
The model is part of your app's .ipa file, which
increases its size. |
The model is not part of your app's .ipa file. It is
hosted by uploading to Cloud Storage. We recommend using
Cloud Storage for
Firebase. |
| The model is available immediately, even when the Android device is offline | Your app must include code to download the model on demand |
| No need for a Firebase project | Requires a Firebase project (if using Cloud Storage for Firebase). |
| You must republish your app to update the model | Push model updates without republishing your app |
| No built-in A/B testing | A/B testing with Firebase Remote Config |
Try it out
- See the vision quickstart app for an example usage of the bundled model and the automl quickstart app for an example usage of the hosted model.
- See the Material Design showcase app for an end-to-end implementation of this API.
Before you begin
Include the ML Kit libraries in your Podfile:
pod 'GoogleMLKit/ObjectDetectionCustom', '8.0.0'After you install or update your project's Pods, open your Xcode project using its
.xcworkspace. ML Kit is supported in Xcode version 13.2.1 or higher.If you want to download a model using Cloud Storage for Firebase, make sure you add Firebase to your iOS project, if you have not already done so. This is not required when you bundle the model.
1. Load the model
Configure a local model source
To bundle the model with your app:
Copy the model file (usually ending in
.tfliteor.lite) to your Xcode project, taking care to selectCopy bundle resourceswhen you do so. The model file will be included in the app bundle and available to ML Kit.Create
LocalModelobject, specifying the path to the model file:Swift
let localModel = LocalModel(path: localModelFilePath)
Objective-C
MLKLocalModel *localModel = [[MLKLocalModel alloc] initWithPath:localModelFilePath];
Configure a remotely-hosted model source
To use the remotely-hosted model, you must download the model file to the device's local storage using your own app logic, and then load it as a local model. We recommend using Cloud Storage for Firebase to host a model. For implementation details, see the Firebase ML to Cloud Storage migration guide.
2. Configure the object detector
After you configure your model sources, configure the object detector for your
use case with a CustomObjectDetectorOptions object. You can change the
following settings:
| Object Detector Settings | |
|---|---|
| Detection mode |
STREAM_MODE (default) | SINGLE_IMAGE_MODE
In In |
| Detect and track multiple objects |
false (default) | true
Whether to detect and track up to five objects or only the most prominent object (default). |
| Classify objects |
false (default) | true
Whether or not to classify detected objects by using the provided
custom classifier model. To use your custom classification
model, you need to set this to |
| Classification confidence threshold |
Minimum confidence score of detected labels. If not set, any classifier threshold specified by the model’s metadata will be used. If the model does not contain any metadata or the metadata does not specify a classifier threshold, a default threshold of 0.0 will be used. |
| Maximum labels per object |
Maximum number of labels per object that the detector will return. If not set, the default value of 10 will be used. |
If you only have a locally-bundled model, just create an object detector from
your LocalModel object:
Swift
let options = CustomObjectDetectorOptions(localModel: localModel) options.detectorMode = .singleImage options.shouldEnableClassification = true options.shouldEnableMultipleObjects = true options.classificationConfidenceThreshold = NSNumber(value: 0.5) options.maxPerObjectLabelCount = 3
Objective-C
MLKCustomObjectDetectorOptions *options = [[MLKCustomObjectDetectorOptions alloc] initWithLocalModel:localModel]; options.detectorMode = MLKObjectDetectorModeSingleImage; options.shouldEnableClassification = YES; options.shouldEnableMultipleObjects = YES; options.classificationConfidenceThreshold = @(0.5); options.maxPerObjectLabelCount = 3;
If you have a remotely-hosted model, you will have to check that it has been downloaded before you run it.
Although you only have to confirm this before running the object detector, if
you have both a remotely-hosted model and a locally-bundled model, it might make
sense to perform this check when instantiating the ObjectDetector: create a
detector from the remote model if it's been downloaded, and from the local model
otherwise.
Swift
// Path where your download logic saves the model let documentDirectory = FileManager.default.urls(for: .documentDirectory, in: .userDomainMask).first! let localModelURL = documentDirectory.appendingPathComponent("my_remote_model.tflite") let model: LocalModel if FileManager.default.fileExists(atPath: localModelURL.path) { // Use the downloaded model model = LocalModel(path: localModelURL.path) } else { // Fall back to bundled model guard let bundledModelPath = Bundle.main.path(forResource: "model", ofType: "tflite") else { return } model = LocalModel(path: bundledModelPath) } let options = CustomObjectDetectorOptions(localModel: model) options.detectorMode = .singleImage options.shouldEnableClassification = true options.shouldEnableMultipleObjects = true options.classificationConfidenceThreshold = NSNumber(value: 0.5) options.maxPerObjectLabelCount = 3 let objectDetector = ObjectDetector.objectDetector(options: options)
Objective-C
NSString *documentsDirectory = [NSSearchPathForDirectoriesInDomains(NSDocumentDirectory, NSUserDomainMask, YES) firstObject]; NSString *localModelPath = [documentsDirectory stringByAppendingPathComponent:@"my_remote_model.tflite"]; MLKLocalModel *model; if ([NSFileManager.defaultManager fileExistsAtPath:localModelPath]) { // Use the downloaded model model = [[MLKLocalModel alloc] initWithPath:localModelPath]; } else { // Fall back to bundled model NSString *bundledModelPath = [NSBundle.mainBundle pathForResource:@"model" ofType:@"tflite"]; model = [[MLKLocalModel alloc] initWithPath:bundledModelPath]; } MLKCustomObjectDetectorOptions *options = [[MLKCustomObjectDetectorOptions alloc] initWithLocalModel:model]; options.detectorMode = MLKObjectDetectorModeSingleImage; options.shouldEnableClassification = YES; options.shouldEnableMultipleObjects = YES; options.classificationConfidenceThreshold = @(0.5); options.maxPerObjectLabelCount = 3; MLKObjectDetector *objectDetector = [MLKObjectDetector objectDetectorWithOptions:options];
If you only have a remotely-hosted model, you should disable model-related functionality—for example, gray-out or hide part of your UI—until you confirm the model has been downloaded.
Swift
let documentDirectory = FileManager.default.urls(for: .documentDirectory, in: .userDomainMask).first! let localModelURL = documentDirectory.appendingPathComponent("my_remote_model.tflite") if FileManager.default.fileExists(atPath: localModelURL.path) { // Model is already cached, initialize immediately self.initializeDetector(with: localModelURL) } else { // Model is not yet available, show loading UI and start download self.showLoadingUI() let storage = Storage.storage() let modelRef = storage.reference(forURL: "gs://YOUR_BUCKET/path/to/model.tflite") modelRef.write(toFile: localModelURL) { url, error in self.hideLoadingUI() if let error = error { // Handle download error self.showErrorUI() } else if let modelURL = url { // Download success, initialize detector self.initializeDetector(with: modelURL) } } } func initializeDetector(with modelURL: URL) { let localModel = LocalModel(path: modelURL.path) let options = CustomObjectDetectorOptions(localModel: localModel) options.detectorMode = .singleImage options.shouldEnableClassification = true options.shouldEnableMultipleObjects = true self.objectDetector = ObjectDetector.objectDetector(options: options) // Enable ML features in UI self.enableMLFeatures() }
Objective-C
NSString *documentsDirectory = [NSSearchPathForDirectoriesInDomains(NSDocumentDirectory, NSUserDomainMask, YES) firstObject]; NSString *localModelPath = [documentsDirectory stringByAppendingPathComponent:@"my_remote_model.tflite"]; NSURL *localModelURL = [NSURL fileURLWithPath:localModelPath]; if ([NSFileManager.defaultManager fileExistsAtPath:localModelPath]) { // Model is already cached, initialize immediately [self initializeDetectorWithURL:localModelURL]; } else { // Model is not yet available, show loading UI and start download [self showLoadingUI]; FIRStorage *storage = [FIRStorage storage]; FIRStorageReference *modelRef = [storage referenceForURL:@"gs://YOUR_BUCKET/path/to/model.tflite"]; [modelRef writeToFile:localModelURL completion:^(NSURL * _Nullable URL, NSError * _Nullable error) { [self hideLoadingUI]; if (error != nil) { // Handle download error [self showErrorUI]; } else { // Download success, initialize detector [self initializeDetectorWithURL:URL]; } }]; } - (void)initializeDetectorWithURL:(NSURL *)modelURL { MLKLocalModel *localModel = [[MLKLocalModel alloc] initWithPath:modelURL.path]; MLKCustomObjectDetectorOptions *options = [[MLKCustomObjectDetectorOptions alloc] initWithLocalModel:localModel]; options.detectorMode = MLKObjectDetectorModeSingleImage; options.shouldEnableClassification = YES; options.shouldEnableMultipleObjects = YES; self.objectDetector = [MLKObjectDetector objectDetectorWithOptions:options]; // Enable ML features in UI [self enableMLFeatures]; }
The object detection and tracking API is optimized for these two core use cases:
- Live detection and tracking of the most prominent object in the camera viewfinder.
- The detection of multiple objects from a static image.
To configure the API for these use cases:
Swift
// Live detection and tracking let options = CustomObjectDetectorOptions(localModel: localModel) options.shouldEnableClassification = true options.maxPerObjectLabelCount = 3 // Multiple object detection in static images let options = CustomObjectDetectorOptions(localModel: localModel) options.detectorMode = .singleImage options.shouldEnableMultipleObjects = true options.shouldEnableClassification = true options.maxPerObjectLabelCount = 3
Objective-C
// Live detection and tracking MLKCustomObjectDetectorOptions *options = [[MLKCustomObjectDetectorOptions alloc] initWithLocalModel:localModel]; options.shouldEnableClassification = YES; options.maxPerObjectLabelCount = 3; // Multiple object detection in static images MLKCustomObjectDetectorOptions *options = [[MLKCustomObjectDetectorOptions alloc] initWithLocalModel:localModel]; options.detectorMode = MLKObjectDetectorModeSingleImage; options.shouldEnableMultipleObjects = YES; options.shouldEnableClassification = YES; options.maxPerObjectLabelCount = 3;
3. Prepare the input image
Create a VisionImage object using a UIImage or a
CMSampleBuffer.
If you use a UIImage, follow these steps:
- Create a
VisionImageobject with theUIImage. Make sure to specify the correct.orientation.Swift
let image = VisionImage(image: UIImage) visionImage.orientation = image.imageOrientation
Objective-C
MLKVisionImage *visionImage = [[MLKVisionImage alloc] initWithImage:image]; visionImage.orientation = image.imageOrientation;
If you use a
CMSampleBuffer, follow these steps:-
Specify the orientation of the image data contained in the
CMSampleBuffer.To get the image orientation:
Swift
func imageOrientation( deviceOrientation: UIDeviceOrientation, cameraPosition: AVCaptureDevice.Position ) -> UIImage.Orientation { switch deviceOrientation { case .portrait: return cameraPosition == .front ? .leftMirrored : .right case .landscapeLeft: return cameraPosition == .front ? .downMirrored : .up case .portraitUpsideDown: return cameraPosition == .front ? .rightMirrored : .left case .landscapeRight: return cameraPosition == .front ? .upMirrored : .down case .faceDown, .faceUp, .unknown: return .up } }
Objective-C
- (UIImageOrientation) imageOrientationFromDeviceOrientation:(UIDeviceOrientation)deviceOrientation cameraPosition:(AVCaptureDevicePosition)cameraPosition { switch (deviceOrientation) { case UIDeviceOrientationPortrait: return cameraPosition == AVCaptureDevicePositionFront ? UIImageOrientationLeftMirrored : UIImageOrientationRight; case UIDeviceOrientationLandscapeLeft: return cameraPosition == AVCaptureDevicePositionFront ? UIImageOrientationDownMirrored : UIImageOrientationUp; case UIDeviceOrientationPortraitUpsideDown: return cameraPosition == AVCaptureDevicePositionFront ? UIImageOrientationRightMirrored : UIImageOrientationLeft; case UIDeviceOrientationLandscapeRight: return cameraPosition == AVCaptureDevicePositionFront ? UIImageOrientationUpMirrored : UIImageOrientationDown; case UIDeviceOrientationUnknown: case UIDeviceOrientationFaceUp: case UIDeviceOrientationFaceDown: return UIImageOrientationUp; } }
- Create a
VisionImageobject using theCMSampleBufferobject and orientation:Swift
let image = VisionImage(buffer: sampleBuffer) image.orientation = imageOrientation( deviceOrientation: UIDevice.current.orientation, cameraPosition: cameraPosition)
Objective-C
MLKVisionImage *image = [[MLKVisionImage alloc] initWithBuffer:sampleBuffer]; image.orientation = [self imageOrientationFromDeviceOrientation:UIDevice.currentDevice.orientation cameraPosition:cameraPosition];
4. Create and run the object detector
Create a new object detector:
Swift
let objectDetector = ObjectDetector.objectDetector(options: options)
Objective-C
MLKObjectDetector *objectDetector = [MLKObjectDetector objectDetectorWithOptions:options];
Then, use the detector:
Asynchronously:
Swift
objectDetector.process(image) { objects, error in guard error == nil, let objects = objects, !objects.isEmpty else { // Handle the error. return } // Show results. }
Objective-C
[objectDetector processImage:image completion:^(NSArray
*_Nullable objects, NSError *_Nullable error) { if (objects.count == 0) { // Handle the error. return; } // Show results. }]; Synchronously:
Swift
var objects: [Object] do { objects = try objectDetector.results(in: image) } catch let error { // Handle the error. return } // Show results.
Objective-C
NSError *error; NSArray
*objects = [objectDetector resultsInImage:image error:&error]; // Show results or handle the error.
5. Get information about labeled objects
If the call to the image processor succeeds, it either passes a list of
Objects to the completion handler or returns the list, depending on whether you called the asynchronous or synchronous method.Each
Objectcontains the following properties:frameA CGRectindicating the position of the object in the image.trackingIDAn integer that identifies the object across images, or `nil` in SINGLE_IMAGE_MODE. labelslabel.textThe label's text description. Only returned if the LiteRT model's metadata contains label descriptions. label.indexThe label's index among all the labels supported by the classifier. label.confidenceThe confidence value of the object classification. Swift
// objects contains one item if multiple object detection wasn't enabled. for object in objects { let frame = object.frame let trackingID = object.trackingID let description = object.labels.enumerated().map { (index, label) in "Label \(index): \(label.text), \(label.confidence), \(label.index)" }.joined(separator: "\n") }
Objective-C
// The list of detected objects contains one item if multiple object detection // wasn't enabled. for (MLKObject *object in objects) { CGRect frame = object.frame; NSNumber *trackingID = object.trackingID; for (MLKObjectLabel *label in object.labels) { NSString *labelString = [NSString stringWithFormat:@"%@, %f, %lu", label.text, label.confidence, (unsigned long)label.index]; } }
Ensuring a great user experience
For the best user experience, follow these guidelines in your app:
- Successful object detection depends on the object's visual complexity. In order to be detected, objects with a small number of visual features might need to take up a larger part of the image. You should provide users with guidance on capturing input that works well with the kind of objects you want to detect.
- When you use classification, if you want to detect objects that don't fall cleanly into the supported categories, implement special handling for unknown objects.
Also, check out the ML Kit Material Design showcase app and the Material Design Patterns for machine learning-powered features collection.
Improving performance
If you want to use object detection in a real-time application, follow these guidelines to achieve the best framerates:When you use streaming mode in a real-time application, don't use multiple object detection, as most devices won't be able to produce adequate framerates.
- For processing video frames, use the
results(in:)synchronous API of the detector. Call this method from theAVCaptureVideoDataOutputSampleBufferDelegate'scaptureOutput(_, didOutput:from:)function to synchronously get results from the given video frame. KeepAVCaptureVideoDataOutput'salwaysDiscardsLateVideoFramesastrueto throttle calls to the detector. If a new video frame becomes available while the detector is running, it will be dropped. - If you use the output of the detector to overlay graphics on the input image, first get the result from ML Kit, then render the image and overlay in a single step. By doing so, you render to the display surface only once for each processed input frame. See the updatePreviewOverlayViewWithLastFrame in the ML Kit quickstart sample for an example.
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-06-16 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2026-06-16 UTC."],[],["ML Kit enables detecting and tracking up to five objects in images or video, identified by unique IDs. Models can be bundled locally, increasing app size, or hosted remotely via Firebase, downloaded on demand, and updated without app republishing. Configure object detection via `CustomObjectDetectorOptions`, setting detection mode, multiple object detection, and classification settings. Create a local or remote detector, and process images using synchronous or asynchronous methods. Image orientation is handled with a switch method that is dependent of the camera position. Performance optimizations are given for video frame processing.\n"]]
-