You can use the camera feed that ARCore captures in a machine learning pipeline to create an intelligent augmented reality experience. The ARCore ML Kit sample demonstrates how to use ML Kit and the Google Cloud Vision API to identify real-world objects. The sample uses a machine learning model to classify objects in the camera's view and attaches a label to the object in the virtual scene.
The ARCore ML Kit sample is written in Kotlin. It is also available as the ml_kotlin sample app in the ARCore SDK GitHub repository.
Use ARCore's CPU image
ARCore captures at least two sets of image streams by default:
- A CPU image stream used for feature recognition and image processing. By default, the CPU image has a resolution of VGA (640x480). ARCore can be configured to use an additional higher resolution image stream, if required.
- A GPU texture stream, which contains a high-resolution texture, usually at a resolution of 1080p. This is typically used as a user-facing camera preview.
This is stored in the OpenGL texture specified by
Session.setCameraTextureName()
. - Any additional streams specified by
SharedCamera.setAppSurfaces()
.
CPU image size considerations
No additional cost is incurred if the default VGA-sized CPU stream is used because ARCore uses this stream for world comprehension. Requesting a stream with a different resolution may be expensive, as an additional stream will need to be captured. Keep in mind that a higher resolution may quickly become expensive for your model: doubling the width and height of the image quadruples the amount of pixels in the image.
It may be advantageous to downscale the image, if your model can still perform well on a lower resolution image.
Configure an additional high resolution CPU image stream
The performance of your ML model may depend on the resolution of the image used as input. The resolution of these streams can be adjusted by changing the current CameraConfig
using Session.setCameraConfig()
, selecting a valid configuration from Session.getSupportedCameraConfigs()
.
Java
CameraConfigFilter cameraConfigFilter = new CameraConfigFilter(session) // World-facing cameras only. .setFacingDirection(CameraConfig.FacingDirection.BACK); List<CameraConfig> supportedCameraConfigs = session.getSupportedCameraConfigs(cameraConfigFilter); // Select an acceptable configuration from supportedCameraConfigs. CameraConfig cameraConfig = selectCameraConfig(supportedCameraConfigs); session.setCameraConfig(cameraConfig);
Kotlin
val cameraConfigFilter = CameraConfigFilter(session) // World-facing cameras only. .setFacingDirection(CameraConfig.FacingDirection.BACK) val supportedCameraConfigs = session.getSupportedCameraConfigs(cameraConfigFilter) // Select an acceptable configuration from supportedCameraConfigs. val cameraConfig = selectCameraConfig(supportedCameraConfigs) session.setCameraConfig(cameraConfig)
Retrieve the CPU image
Retrieve the CPU image using Frame.acquireCameraImage()
.
These images should be disposed of as soon as they're no longer needed.
Java
Image cameraImage = null; try { cameraImage = frame.acquireCameraImage(); // Process `cameraImage` using your ML inference model. } catch (NotYetAvailableException e) { // NotYetAvailableException is an exception that can be expected when the camera is not ready // yet. The image may become available on a next frame. } catch (RuntimeException e) { // A different exception occurred, e.g. DeadlineExceededException, ResourceExhaustedException. // Handle this error appropriately. handleAcquireCameraImageFailure(e); } finally { if (cameraImage != null) { cameraImage.close(); } }
Kotlin
// NotYetAvailableException is an exception that can be expected when the camera is not ready yet. // Map it to `null` instead, but continue to propagate other errors. fun Frame.tryAcquireCameraImage() = try { acquireCameraImage() } catch (e: NotYetAvailableException) { null } catch (e: RuntimeException) { // A different exception occurred, e.g. DeadlineExceededException, ResourceExhaustedException. // Handle this error appropriately. handleAcquireCameraImageFailure(e) } // The `use` block ensures the camera image is disposed of after use. frame.tryAcquireCameraImage()?.use { image -> // Process `image` using your ML inference model. }
Process the CPU image
To process the CPU image, various machine learning libraries can be used.
- ML Kit: ML Kit provides an on-device Object Detection and Tracking API.
It comes with a coarse classifier built into the API, and can also use custom classification models to cover a narrower domain of objects.
Use
InputImage.fromMediaImage
to convert your CPU image into anInputImage
. - Firebase Machine Learning: Firebase provides Machine Learning APIs that work either in the cloud or on the device. See Firebase documentation on Label Images Securely with Cloud Vision using Firebase Auth and Functions on Android.
Display results in your AR scene
Image recognition models often output detected objects by indicating a center point or a bounding polygon representing the detected object.
Using the center point or center of the bounding box that is output from the model, it's possible to attach an anchor to the detected object. Use Frame.hitTest()
to estimate the pose of an object in the virtual scene.
Convert IMAGE_PIXELS
coordinates to VIEW
coordinates:
Java
// Suppose `mlResult` contains an (x, y) of a given point on the CPU image. float[] cpuCoordinates = new float[] {mlResult.getX(), mlResult.getY()}; float[] viewCoordinates = new float[2]; frame.transformCoordinates2d( Coordinates2d.IMAGE_PIXELS, cpuCoordinates, Coordinates2d.VIEW, viewCoordinates); // `viewCoordinates` now contains coordinates suitable for hit testing.
Kotlin
// Suppose `mlResult` contains an (x, y) of a given point on the CPU image. val cpuCoordinates = floatArrayOf(mlResult.x, mlResult.y) val viewCoordinates = FloatArray(2) frame.transformCoordinates2d( Coordinates2d.IMAGE_PIXELS, cpuCoordinates, Coordinates2d.VIEW, viewCoordinates ) // `viewCoordinates` now contains coordinates suitable for hit testing.
Use these VIEW
coordinates to conduct a hit test and create an anchor from the result:
Java
List<HitResult> hits = frame.hitTest(viewCoordinates[0], viewCoordinates[1]); HitResult depthPointResult = null; for (HitResult hit : hits) { if (hit.getTrackable() instanceof DepthPoint) { depthPointResult = hit; break; } } if (depthPointResult != null) { Anchor anchor = depthPointResult.getTrackable().createAnchor(depthPointResult.getHitPose()); // This anchor will be attached to the scene with stable tracking. // It can be used as a position for a virtual object, with a rotation prependicular to the // estimated surface normal. }
Kotlin
val hits = frame.hitTest(viewCoordinates[0], viewCoordinates[1]) val depthPointResult = hits.filter { it.trackable is DepthPoint }.firstOrNull() if (depthPointResult != null) { val anchor = depthPointResult.trackable.createAnchor(depthPointResult.hitPose) // This anchor will be attached to the scene with stable tracking. // It can be used as a position for a virtual object, with a rotation prependicular to the // estimated surface normal. }
Performance considerations
Follow the following recommendations to save processing power and consume less energy:
- Do not run your ML model on every incoming frame. Consider running object detection at a low framerate instead.
- Consider an online ML inference model to reduce computational complexity.
Next steps
- Learn about Best Practices for ML Engineering.
- Learn about Responsible AI practices.
- Follow the basics of machine learning with TensorFlow course.