Use ARCore as input for Machine Learning models

You can use the camera feed that ARCore captures in a machine learning pipeline to create an intelligent augmented reality experience. The ARCore ML Kit sample demonstrates how to use ML Kit and the Google Cloud Vision API to identify real-world objects. The sample uses a machine learning model to classify objects in the camera's view and attaches a label to the object in the virtual scene.

The ARCore ML Kit sample is written in Kotlin. It is also available as the ml_kotlin sample app in the ARCore SDK GitHub repository.

Use ARCore's CPU image

ARCore captures at least two sets of image streams by default:

  • A CPU image stream used for feature recognition and image processing. By default, the CPU image has a resolution of VGA (640x480). ARCore can be configured to use an additional higher resolution image stream, if required.
  • A GPU texture stream, which contains a high-resolution texture, usually at a resolution of 1080p. This is typically used as a user-facing camera preview. This is stored in the OpenGL texture specified by Session.setCameraTextureName().
  • Any additional streams specified by SharedCamera.setAppSurfaces().

CPU image size considerations

No additional cost is incurred if the default VGA-sized CPU stream is used because ARCore uses this stream for world comprehension. Requesting a stream with a different resolution may be expensive, as an additional stream will need to be captured. Keep in mind that a higher resolution may quickly become expensive for your model: doubling the width and height of the image quadruples the amount of pixels in the image.

It may be advantageous to downscale the image, if your model can still perform well on a lower resolution image.

Configure an additional high resolution CPU image stream

The performance of your ML model may depend on the resolution of the image used as input. The resolution of these streams can be adjusted by changing the current CameraConfig using Session.setCameraConfig(), selecting a valid configuration from Session.getSupportedCameraConfigs().

Java

CameraConfigFilter cameraConfigFilter =
    new CameraConfigFilter(session)
        // World-facing cameras only.
        .setFacingDirection(CameraConfig.FacingDirection.BACK);
List<CameraConfig> supportedCameraConfigs =
    session.getSupportedCameraConfigs(cameraConfigFilter);

// Select an acceptable configuration from supportedCameraConfigs.
CameraConfig cameraConfig = selectCameraConfig(supportedCameraConfigs);
session.setCameraConfig(cameraConfig);

Kotlin

val cameraConfigFilter =
  CameraConfigFilter(session)
    // World-facing cameras only.
    .setFacingDirection(CameraConfig.FacingDirection.BACK)
val supportedCameraConfigs = session.getSupportedCameraConfigs(cameraConfigFilter)

// Select an acceptable configuration from supportedCameraConfigs.
val cameraConfig = selectCameraConfig(supportedCameraConfigs)
session.setCameraConfig(cameraConfig)

Retrieve the CPU image

Retrieve the CPU image using Frame.acquireCameraImage(). These images should be disposed of as soon as they're no longer needed.

Java

Image cameraImage = null;
try {
  cameraImage = frame.acquireCameraImage();
  // Process `cameraImage` using your ML inference model.
} catch (NotYetAvailableException e) {
  // NotYetAvailableException is an exception that can be expected when the camera is not ready
  // yet. The image may become available on a next frame.
} catch (RuntimeException e) {
  // A different exception occurred, e.g. DeadlineExceededException, ResourceExhaustedException.
  // Handle this error appropriately.
  handleAcquireCameraImageFailure(e);
} finally {
  if (cameraImage != null) {
    cameraImage.close();
  }
}

Kotlin

// NotYetAvailableException is an exception that can be expected when the camera is not ready yet.
// Map it to `null` instead, but continue to propagate other errors.
fun Frame.tryAcquireCameraImage() =
  try {
    acquireCameraImage()
  } catch (e: NotYetAvailableException) {
    null
  } catch (e: RuntimeException) {
    // A different exception occurred, e.g. DeadlineExceededException, ResourceExhaustedException.
    // Handle this error appropriately.
    handleAcquireCameraImageFailure(e)
  }

// The `use` block ensures the camera image is disposed of after use.
frame.tryAcquireCameraImage()?.use { image ->
  // Process `image` using your ML inference model.
}

Process the CPU image

To process the CPU image, various machine learning libraries can be used.

Display results in your AR scene

Image recognition models often output detected objects by indicating a center point or a bounding polygon representing the detected object.

Using the center point or center of the bounding box that is output from the model, it's possible to attach an anchor to the detected object. Use Frame.hitTest() to estimate the pose of an object in the virtual scene.

Convert IMAGE_PIXELS coordinates to VIEW coordinates:

Java

// Suppose `mlResult` contains an (x, y) of a given point on the CPU image.
float[] cpuCoordinates = new float[] {mlResult.getX(), mlResult.getY()};
float[] viewCoordinates = new float[2];
frame.transformCoordinates2d(
    Coordinates2d.IMAGE_PIXELS, cpuCoordinates, Coordinates2d.VIEW, viewCoordinates);
// `viewCoordinates` now contains coordinates suitable for hit testing.

Kotlin

// Suppose `mlResult` contains an (x, y) of a given point on the CPU image.
val cpuCoordinates = floatArrayOf(mlResult.x, mlResult.y)
val viewCoordinates = FloatArray(2)
frame.transformCoordinates2d(
  Coordinates2d.IMAGE_PIXELS,
  cpuCoordinates,
  Coordinates2d.VIEW,
  viewCoordinates
)
// `viewCoordinates` now contains coordinates suitable for hit testing.

Use these VIEW coordinates to conduct a hit test and create an anchor from the result:

Java

List<HitResult> hits = frame.hitTest(viewCoordinates[0], viewCoordinates[1]);
HitResult depthPointResult = null;
for (HitResult hit : hits) {
  if (hit.getTrackable() instanceof DepthPoint) {
    depthPointResult = hit;
    break;
  }
}
if (depthPointResult != null) {
  Anchor anchor = depthPointResult.getTrackable().createAnchor(depthPointResult.getHitPose());
  // This anchor will be attached to the scene with stable tracking.
  // It can be used as a position for a virtual object, with a rotation prependicular to the
  // estimated surface normal.
}

Kotlin

val hits = frame.hitTest(viewCoordinates[0], viewCoordinates[1])
val depthPointResult = hits.filter { it.trackable is DepthPoint }.firstOrNull()
if (depthPointResult != null) {
  val anchor = depthPointResult.trackable.createAnchor(depthPointResult.hitPose)
  // This anchor will be attached to the scene with stable tracking.
  // It can be used as a position for a virtual object, with a rotation prependicular to the
  // estimated surface normal.
}

Performance considerations

Follow the following recommendations to save processing power and consume less energy:

  • Do not run your ML model on every incoming frame. Consider running object detection at a low framerate instead.
  • Consider an online ML inference model to reduce computational complexity.

Next steps