Best practices for bulk importing client-side encrypted files

This guide covers best practices for building a custom tool to bulk import client-side encrypted (CSE) files using the Drive API.

Consider Drive for desktop for self-service migrations

A user can import files from their local machine using the Drive for desktop client. It fully supports client-side encryption and allows users to encrypt and upload files themselves. Building a custom tool as described in this guide is only necessary for large-scale, unattended, or multi-user bulk imports.

Before you begin

You must understand how to manage a single client-side encrypted file. Review Manage individual CSE files to learn the fundamental steps for encryption, upload/download, and decryption, including token generation and interacting with your Key Access Control List Service (KACLS).

Authenticate using a service account

Use a service account with domain-wide delegation when interacting with the Drive API. This allows your application to impersonate users, so you can programmatically loop through them and upload files directly on their behalf.

Recreate the directory structure

Design your import tool to recursively traverse the source files and folders to mirror the existing directory structure onto Drive. The high-level process is as follows:

  1. For each source directory, create a corresponding Drive folder.
  2. Encrypt and upload the directory's files into the created Drive folder.
  3. Repeat the process for subdirectories.

Upload files to the right place

Always upload files to the correct user's My Drive or an accessible Shared Drive. To find a Shared Drive or folder ID, find it statically from the Drive web URL or dynamically by using the drives.list and files.list methods.

Preventing duplicates

Drive allows multiple files in the same folder to share the exact same filename. Generate IDs for your files before uploading. You can use the Drive API files.generateIds method to accomplish this.

By storing these pre-generated IDs, your importer can attempt to upload each file to its specific ID. If a file with that ID already exists, your tool can safely skip it. Using pre-generated IDs will help your tool recover from a crash.

Request a CSE token before each upload

Call generateCseToken for each file immediately prior to key wrapping and file upload. This approach ensures the token accurately reflects the current state of the associated metadata, which can change.

Use resumable uploads for large files

Use Drive API resumable uploads for migrating large files. Resumable uploads allow your importer to retry failed chunks during network interruptions, rather than restarting the entire file upload.

Restore sharing permissions

If your bulk import tool needs to preserve the sharing permissions, first encrypt and upload the file, and then call the permissions.create method. Sharing permissions are not applied during the file upload itself.

Handle token expiration

For long-running operations, your script may encounter authentication errors due to token expiration. Implement logic to automatically refresh access tokens and retry uploads. For more details, see the open source example that demonstrates how to encrypt and upload a single file.

Validate imported files thoroughly

Perform thorough validation after a bulk import. Google cannot decrypt and validate your files server-side. The Validate imported files section details several methods for spot checking individual files.

An additional method to verify at scale, particularly useful after a bulk import, involves using the official decrypter tool. First, download the encrypted content from Drive using Google Takeout. Then, attempt decryption using the decrypter tool. This process helps identify any files that cannot be decrypted, pointing to potential issues in your import tool's encryption or key wrapping logic.

Understand limits and quotas

Client-side encrypted files are subject to standard Drive limits and quotas. Be aware of shared drive limits, general file and folder limits, and how to manage your quota. Additionally, your import tool must handle rate limits from your Key Access Control List Service (KACLS) and your Identity Provider (IdP).