Configure and use entity resolution in BigQuery

This document shows how to implement entity resolution for entity resolution end users (hereafter referred to as end users) and identity providers.

End users can use this document to connect with an identity provider and use the provider's service to match records. Identity providers can use this document to set up and configure services to share with end users on the Google Cloud Marketplace.

Workflow for end users

The following sections show end users how to configure entity resolution in BigQuery. For a visual representation of the complete setup, see the architecture for entity resolution.

Before you begin

Contact and establish a relationship with an identity provider. BigQuery supports entity resolution with LiveRamp.
Acquire the following items from the identity provider:
- Service account credentials
- Remote function signature
Create two datasets in your project:
- Input dataset
- Output dataset

Required roles

To get the permissions that you need to run entity resolution jobs, ask your administrator to grant you the following IAM roles:

For the identity provider's service account to read the input dataset and write to the output dataset:
- BigQuery Data Viewer (roles/bigquery.dataViewer) on the input dataset
- BigQuery Data Editor (roles/bigquery.dataEditor) on the output dataset

For more information about granting roles, see Manage access.

You might also be able to get the required permissions through custom roles or other predefined roles.

Translate or resolve entities

For specific identity provider instructions, refer to the following sections.

LiveRamp

Prerequisites

Configure LiveRamp Embedded Identity in BigQuery. For more information, see Enabling LiveRamp Embedded Identity in BigQuery.
Coordinate with LiveRamp to enable API credentials for use with Embedded Identity. For more information, see Authentication.

Setup

The following steps are required when you use LiveRamp Embedded Identity for the first time. After setup is complete, only the input table and metadata table need to be modified between runs.

Create an input table

Create a table in the input dataset. Populate the table with RampIDs, target domains, and target types. For details and examples, see Input Table Columns and Descriptions.

Create a metadata table

The metadata table is used to control the execution of LiveRamp Embedded Identity on BigQuery. Create a metadata table in the input dataset. Populate the metadata table with client IDs, execution modes, target domains, and target types. For details and examples, see Metadata Table Columns and Descriptions.

Share tables with LiveRamp

Grant the LiveRamp Google Cloud service account access to view and process data in your input dataset. For details and examples, see Share Tables and Datasets with LiveRamp.

Run an embedded identity job

To run an embedded identity job with LiveRamp in BigQuery, do the following:

Confirm that all RampIDs that were encoded in your domain are in your input table.
Confirm that your metadata table is still accurate before you run the job.
Contact LiveRampIdentitySupport@liveramp.com with a job process request. Include the project ID, dataset ID, and table ID (if applicable) for your input table, metadata table, and output dataset. For more information, see Notify LiveRamp to Initiate Transcoding.

Results are generally delivered to your output dataset within three business days.

LiveRamp support

For support issues, contact LiveRamp Identity Support.

LiveRamp billing

LiveRamp handles billing for entity resolution.

Workflow for identity providers

The following sections show identity providers how to configure entity resolution in BigQuery. For a visual representation of the complete setup, see the architecture for entity resolution.

Before you begin

Create a Cloud Run job or a Cloud Function to integrate with the remote function. Both options are suitable for this purpose.
Note the name of the service account that's associated with the Cloud Run or Cloud Function:
1. In the Google Cloud console, go to the Cloud Functions page.
  
  Go to Cloud Functions
2. Click the function's name, and then click the Details tab.
3. In the General Information pane, find and note the service account name for the remote function.
Create a remote function.
Collect end-user principals from the end user.

Required roles

To get the permissions that you need to run entity resolution jobs, ask your administrator to grant you the following IAM roles:

For the service account that's associated with your function to read and write on associated datasets and launch jobs:
- BigQuery Data Editor (roles/bigquery.dataEditor) on the project
- BigQuery Job User (roles/bigquery.jobUser) on the project
For the end-user principal to see and connect to the remote function:
- BigQuery Connection User (roles/bigquery.connectionUser) on the connection
- BigQuery Data Viewer (roles/bigquery.dataViewer) on the control plane dataset with the remote function

For more information about granting roles, see Manage access.

You might also be able to get the required permissions through custom roles or other predefined roles.

Share entity resolution remote function

Modify and share the following remote interface code with the end user. The end user needs this code to start the entity resolution job.

`PARTNER_PROJECT_ID.DATASET_ID`.match`(LIST_OF_PARAMETERS)

Replace LIST_OF_PARAMETERS with the list of parameters that are passed to the remote function.

Optional: Provide job metadata

You can optionally provide job metadata by using a separate remote function or by writing a new status table in the user's output dataset. Examples of metadata include job statuses and metrics.

Billing for identity providers

To streamline customer billing and onboarding, we recommend that you integrate your entity resolution service with the Google Cloud Marketplace. This lets you set up a pricing model based on the entity resolution job usage, with Google handling the billing for you. For more information, see Offering software as a service (SaaS) products.