On-Device Personalization federated compute deterministic build

Deterministic builds are required for workload attestation in the On Device Personalization (ODP) Trusted Execution Environment (TEE), publicly available on Google Cloud as Confidential Space (CS).

The workload images must generate a deterministic image hash that can be used by CS for workload attestation (which uses NIST's RFC 9334 Remote ATtestation procedureS (RATS) Architecture).

This document will go over the implementation and support for deterministic builds in the odp-federatedcompute repository. The ODP Aggregator and Model Updater services will run within Confidential Space. The repository supports deterministic builds for all of our services, which are required for production use cases.

Deterministic builds

The deterministic builds consists of two major components:

  1. The compilation of required binaries. This includes jars, shared libraries, and metadata.
  2. The base image and runtime dependencies. The runtime environment's base image used to execute the compiled binaries.

As of now, the ODP Federated Compute repository supports the following types of workloads:

  • Java + Spring workloads
    • TaskAssignment, TaskManagement, Collector
  • Java + Spring with JNI tensorflow workloads
    • ModelUpdater, Aggregator
  • Python workloads
    • TaskBuilder

Dependencies

The following list are dependencies that ODP relies on to maintain determinism and availability:

  • Bazel
  • GitHub
  • Maven
  • PyPi
  • Debian snapshots
  • DockerHub Registry
  • Google Container Registry (GCR)

Deterministic workloads

All workloads are compiled using Bazel with language-specific toolchains and container images built using rules_oci. The WORKSPACE file defines all the dependencies with corresponding versions and hashes.

Debian snapshots

All workload images should be built within the provided dockerfile built on top of a Debian snapshot. Debian snapshots provide a stable repository snapshot with deterministic:

Java Spring workloads

Bazel's remotejdk_17 is used to provide a hermetic Java for compilation. Other Java dependencies are managed and defined in the WORKSPACE file.

The Java Spring workloads compile to a jar file named <service>_application.jar. The jar contains:

  • Java class files
  • META-INF/
    • Bazel manifest data
  • build-data.properties
    • Bazel build-data
  • BOOT-INF/

Image layers

The Java Spring workload image consists of two layers:

Image configuration

  • Entrypoint
    • java -jar <service>_application.jar

JNI Tensorflow workloads

JNI Tensorflow workloads are built on top of the Java Spring workloads. A hermetic Clang+LLVM Bazel toolchain is provided using prebuilt Clang+LLVM 16 with a sysroot provided by the Debian snapshot image to compile machine code.

The JNI workloads compile to a shared library named libtensorflow.so along with the <service>_application.jar.

Image layers

The JNI tensorflow workload image consists of several layers:

  • Base image layer
  • Debian package dependency layers. The layers are generated using deb archives downloaded from debian-snapshot and repackaged as image layers
    • libc++1-16_amd64.tar
    • libc++abi1-16_amd64.tar
    • libc6_amd64.tar
    • libunwind-16_amd64.tar
    • libgcc-s1_amd64.tar
    • gcc-13-base_amd64.tar
  • Workload layer
    • binary_tar.tar
      • <service>_application.jar
      • libtensorflow-jni.so
      • libaggregation-jni.so

Image configuration

  • Labels (Only for images built to run within TEE)
    • "tee.launch_policy.allow_env_override": "FCP_OPTS"
      • Allows the FCP_OPTS environment variable to be set in confidential space. The workload will consume FCP_OPTS at startup to configure required parameters.
      • The FCP_OPTS environment variable is set when the image is run (instead of built) to maintain build determinism.
    • "tee.launch_policy.log_redirect": "always"
    • "tee.launch_policy.monitoring_memory_allow": "always"
  • Entrypoint
    • java -Djava.library.path=. -jar <service>_application.jar

Python workloads

Bazel's rules_python is used to provide a hermetic Python 3.10 toolchain. A locked pip requirements file is used for deterministic fetching of pip dependencies. The Debian snapshot image ensures deterministic distributions are fetched based on platform compatibility and provides a C++ toolchain for compiling source distributions.

The Python workloads will be packaged into a set of downloaded pip packages, a Python 3.10 distribution, the ODP Python source code, and a Python startup script.

  • <service>.runfiles/
    • Python distribution is stored under python_x86_64-unknown-linux-gnu/
    • Source code is stored under com_google_ondevicepersonalization_federatedcompute/
    • Pip packages are stored under pypi_<dependency_name>/
  • <service>.runfiles_manifest
    • Manifest file for the <service>.runfiles/ directory
  • <service>
    • Python script to run the Python workload using the runfiles

Image layers

The Python workload image consists of four layers:

  • Base image layer
  • Interpreter layer
    • interpreter_layer.jar
      • <service>/<service>.runfiles/python_x86_64-unknown-linux-gnu/**
  • Packages layer
    • packages_layer.jar
      • <service>/<service>.runfiles/**/site-packages/**
  • Workload layer
    • app_tar_manifest.tar
      • Contains source code, startup script, and manifest.
        • <service>/<service>.runfiles_manifest
        • <service>/<service>
        • <service>/<service>.runfiles/com_google_ondevicepersonalization_federatedcompute/**

Image configuration

  • Entrypoint
    • /<service>/<service>

Build images

Once your workloads have been chosen you are ready to build and publish your images.

Prerequisites

  • Bazel 6.4.0
    • Requires Java and C++ installations
  • Docker

Procedure

Images should be built within the docker container built by the provided dockerfile. Two scripts are provided to help with building the final deterministic images.

  • docker_run.sh
    • docker_run.sh will build the docker image from the dockerfile, mount the work directory, mount the host docker daemon and run docker with the provided bash command. Any variables passed before the bash command will be treated as docker run flags.
  • build_images.sh
    • build_images.sh will run bazel build for all images and output the generated image hashes for each built image.

Build all images

./scripts/docker/docker_run.sh "./scripts/build_images.sh"

The expected image hashes for each release can be found under odp-federatedcompute GitHub releases.

Publish images

Publishing is configured using oci_push Bazel rules. For each service, the target repository should be configured for all:

  • aggregator
  • collector
  • model_updater
  • task_assignment
  • task_management
  • task_scheduler
  • task_builder

Publish a single image

./scripts/docker/docker_run.sh "bazel run //shuffler/services/<servicename_no_underscore>:<servicename_with_underscore>_image_publish"

Built images

All built images will need to be stored and hosted by the creator, such as in a GCP artifact registry.