Restrict data access using analysis rules

This document provides general information about analysis rules in GoogleSQL for BigQuery.

What is an analysis rule?

An analysis rule enforces a condition for sharing data. With BigQuery, you can enforce an analysis rule on a view, using a data clean room or by applying the analysis rule directly to the view. When you enforce an analysis rule, you enforce that everyone querying that view must abide by that analysis rule on the view. If the analysis rule is satisfied, the query produces output that satisfies the analysis rule. If the query doesn't satisfy the analysis rule, an error is produced.

Supported analysis rules

The following analysis rules are supported:

  • Aggregation threshold analysis rule: Enforces the minimum number of distinct entities that must be present in a dataset. You can enforce this rule on a view, using statements or data clean rooms.
  • Differential privacy analysis rule: Enforces a privacy budget, which limits the data that is revealed to a subscriber when the data is protected with differential privacy. You can enforce this rule on a view, using statements or data clean rooms.
  • Join restriction analysis rule: Limits the types of joins that can be used with specific columns. Joins don't need to be present in queries and certain columns can be blocked. Can be included with an aggregation threshold analysis rule or differential privacy analysis rule. You can enforce this rule on a view, using statements or data clean rooms.
  • List overlap analysis rule: Similar to the join restriction analysis rule, but it can't be used with other analysis rules. You can enforce this rule on a view, using data clean rooms.

Aggregation threshold analysis rule

An aggregation threshold analysis rule enforces the minimum number of distinct entities that must be present in a dataset, so that statistics on that dataset are included in the results of a query.

When enforced, the aggregation threshold analysis rule groups data across dimensions, while ensuring the aggregation threshold is met. It counts the number of distinct privacy units (represented by the privacy unit column) for each group, and only outputs the groups where the distinct privacy unit count satisfies the aggregation threshold.

A view that includes this analysis rule can also include the join restriction analysis rule.

Define an aggregation threshold analysis rule for a view

You can define an aggregation threshold analysis rule for a view in a data clean room or with the following statement:

CREATE OR REPLACE VIEW VIEW_NAME
  OPTIONS (
    privacy_policy= '{
      "aggregation_threshold_policy": {
        "threshold" : THRESHOLD,
        "privacy_unit_column": "PRIVACY_UNIT_COLUMN"
      }
    }'
  )
  AS QUERY;

Replace the following values:

  • VIEW_NAME: The path and name of the view.
  • THRESHOLD: The minimum number of distinct privacy units that need to contribute to each row in the query results. If a potential row doesn't satisfy this threshold, that row is omitted from the query results.
  • PRIVACY_UNIT_COLUMN: Represents the privacy unit column. A privacy unit column is a unique identifier for a privacy unit. A privacy unit is a value from the privacy unit column that represents the entity in a set of data that is being protected.

    You can use only one privacy unit column, and the data type for the privacy unit column must be groupable.

    The values in the privacy unit column cannot be directly projected through a query, and you can use only analysis rule-supported aggregate functions to aggregate the data in this column.

  • QUERY: The query for the view.

In the following example, an aggregation threshold analysis rule is created:

-- Create a table called ExamTable.
CREATE TABLE mydataset.ExamTable AS (
  SELECT "Hansen" AS last_name, "P91" AS test_id, 510 AS test_score UNION ALL
  SELECT "Wang", "U25", 500 UNION ALL
  SELECT "Wang", "C83", 520 UNION ALL
  SELECT "Wang", "U25", 460 UNION ALL
  SELECT "Hansen", "C83", 420 UNION ALL
  SELECT "Hansen", "C83", 560 UNION ALL
  SELECT "Devi", "U25", 580 UNION ALL
  SELECT "Devi", "P91", 480 UNION ALL
  SELECT "Ivanov", "U25", 490 UNION ALL
  SELECT "Ivanov", "P91", 540 UNION ALL
  SELECT "Silva", "U25", 550);

-- Create a view that includes ExamTable.
CREATE VIEW mydataset.ExamView
OPTIONS(
  privacy_policy= '{"aggregation_threshold_policy": {"threshold": 3, "privacy_unit_column": "last_name"}}'
)
AS ( SELECT * FROM mydataset.ExamTable );

To review the privacy_policy syntax for CREATE VIEW, see the OPTIONS list in CREATE VIEW.

Update an aggregation threshold analysis rule for a view

You can change the aggregation threshold analysis rule for a view in a data clean room or with the following statement:

ALTER VIEW VIEW_NAME
SET OPTIONS (
  privacy_policy= '{
    "aggregation_threshold_policy": {
      "threshold" : THRESHOLD,
      "privacy_unit_column": "PRIVACY_UNIT_COLUMN"
    }
  }'
)

Replace the following values:

  • VIEW_NAME: The path and name of the view.
  • THRESHOLD: The minimum number of distinct privacy units that need to contribute to each row in the query results. If a potential row doesn't satisfy this threshold, that row is omitted from the query results.
  • PRIVACY_UNIT_COLUMN: Represents the privacy unit column. A privacy unit column is a unique identifier for a privacy unit. A privacy unit is a value from the privacy unit column that represents the entity in a set of data that is being protected.

    You can use only one privacy unit column, and the data type for the privacy unit column must be groupable.

    The values in the privacy unit column cannot be directly projected through a query, and you can use only analysis rule-supported aggregate functions to aggregate the data in this column.

In the following example, an aggregation threshold analysis rule is updated:

ALTER VIEW mydataset.ExamView
SET OPTIONS (
  privacy_policy= '{"aggregation_threshold_policy": {"threshold": 50, "privacy_unit_column": "last_name"}}'
);

To review the privacy_policy syntax for ALTER VIEW, see the OPTIONS list in ALTER VIEW SET OPTIONS.

Query an aggregation threshold analysis rule–enforced view

You can query a view that has an aggregation threshold analysis rule with the AGGREGATION_THRESHOLD clause. The query must include aggregation functions, and you can use only analysis rule-supported aggregate functions in this query.

In the following example, a view with an aggregation threshold analysis rule is queried:

-- Create a table called ExamTable.
CREATE TABLE mydataset.ExamTable AS (
  SELECT "Hansen" AS last_name, "P91" AS test_id, 510 AS test_score UNION ALL
  SELECT "Wang", "U25", 500 UNION ALL
  SELECT "Wang", "C83", 520 UNION ALL
  SELECT "Wang", "U25", 460 UNION ALL
  SELECT "Hansen", "C83", 420 UNION ALL
  SELECT "Hansen", "C83", 560 UNION ALL
  SELECT "Devi", "U25", 580 UNION ALL
  SELECT "Devi", "P91", 480 UNION ALL
  SELECT "Ivanov", "U25", 490 UNION ALL
  SELECT "Ivanov", "P91", 540 UNION ALL
  SELECT "Silva", "U25", 550);

-- Create a view that includes ExamTable.
CREATE VIEW mydataset.ExamView
OPTIONS(
  privacy_policy= '{"aggregation_threshold_policy": {"threshold": 3, "privacy_unit_column": "last_name"}}'
)
AS ( SELECT * FROM mydataset.ExamTable );

-- Query an analysis–rule enforced view called ExamView.
SELECT WITH AGGREGATION_THRESHOLD
  test_id, COUNT(DISTINCT last_name) AS student_count
FROM mydataset.ExamView
GROUP BY test_id;

/*---------+---------------*
 | test_id | student_count |
 +---------+---------------+
 | P91     | 3             |
 | U25     | 4             |
 *---------+---------------*/

To review additional examples for the AGGREGATION_THRESHOLD clause, see AGGREGATION_THRESHOLD clause.

Differential privacy analysis rule

The differential privacy analysis rule enforces a privacy budget, which limits the data that is revealed to a subscriber when the data is protected with differential privacy. A privacy budget prevents any subscriber from querying shared data when the sum of all queries' epsilon or delta reaches the total epsilon or total delta value. You can use this analysis rule in a view. A view that includes this analysis rule can also include the join restriction analysis rule.

Define a differential privacy analysis rule for a view

You can define a differential privacy analysis rule for a view in a data clean room or with the following statement:

CREATE OR REPLACE VIEW VIEW_NAME
  OPTIONS (
    privacy_policy= '{
      "differential_privacy_policy": {
        "privacy_unit_column": "PRIVACY_UNIT_COLUMN",
        "max_epsilon_per_query": MAX_EPSILON_PER_QUERY,
        "epsilon_budget": EPSILON_BUDGET,
        "delta_per_query": DELTA_PER_QUERY,
        "delta_budget": DELTA_BUDGET,
        "max_groups_contributed": MAX_GROUPS_CONTRIBUTED
      }
    }'
  )
  AS QUERY;

Replace the following values:

  • PRIVACY_UNIT_COLUMN: The column that identifies the entity in a dataset that is protected using a privacy analysis rule. This value is a JSON string.
  • MAX_EPSILON_PER_QUERY: Determines the strength of the privacy guarantee per query and prevents the total epsilon from being reached by a single query. This value is a JSON number from 0.001 to 1e+15.
  • EPSILON_BUDGET: The epsilon budget that represents the strength of an overall privacy guarantee. This is used in totality for all differentially private queries on the view. This value must be larger than MAX_EPSILON_PER_QUERY, and is a JSON number from 0.001 to 1e+15.
  • DELTA_PER_QUERY: The maximum probability of privacy loss per query beyond the guarantee determined by total epsilon. Prevents the total delta from being reached by a single query. This value is a JSON number from 1e-15 to 1.
  • DELTA_BUDGET: The delta budget, which represents the maximum probability of overall privacy loss beyond the guarantee determined by total epsilon. This is used in totality for all differentially private queries on the view. This value must be larger than DELTA_PER_QUERY, and is a JSON number from 1e-15 to 1000.
  • MAX_GROUPS_CONTRIBUTED: Optional. Limits the number of groups to which an entity in a privacy unit column can contribute. This value must be a non-negative JSON integer.
  • QUERY: The query for the view.

In the following example, a differential privacy analysis rule is created:

-- Create a table called ExamTable.
CREATE TABLE mydataset.ExamTable AS (
  SELECT "Hansen" AS last_name, "P91" AS test_id, 510 AS test_score UNION ALL
  SELECT "Wang", "U25", 500 UNION ALL
  SELECT "Wang", "C83", 520 UNION ALL
  SELECT "Wang", "U25", 460 UNION ALL
  SELECT "Hansen", "C83", 420 UNION ALL
  SELECT "Hansen", "C83", 560 UNION ALL
  SELECT "Devi", "U25", 580 UNION ALL
  SELECT "Devi", "P91", 480 UNION ALL
  SELECT "Ivanov", "U25", 490 UNION ALL
  SELECT "Ivanov", "P91", 540 UNION ALL
  SELECT "Silva", "U25", 550);

-- Create a view that includes ExamTable.
CREATE VIEW mydataset.ExamView
OPTIONS(
  privacy_policy= '{"differential_privacy_policy": {"privacy_unit_column": "last_name", "max_epsilon_per_query": 10.01, "epsilon_budget": 1000.0, "delta_per_query": 0.01, "delta_budget": 1000.0, "max_groups_contributed": 2}}'
)
AS ( SELECT * FROM mydataset.ExamTable );

-- Epsilon parameters are set very high due to the small dataset.
-- In practice, epsilon should be much smaller.

To review the privacy_policy syntax for CREATE VIEW, see the OPTIONS list in CREATE VIEW.

Update a differential privacy analysis rule for a view

You can change the differential privacy analysis rule for a view in a data clean room or with the following statement:

ALTER VIEW VIEW_NAME
SET OPTIONS (
  privacy_policy= '{
    "differential_privacy_policy": {
      "privacy_unit_column": "PRIVACY_UNIT_COLUMN",
      "max_epsilon_per_query": MAX_EPSILON_PER_QUERY,
      "epsilon_budget": EPSILON_BUDGET,
      "delta_per_query": DELTA_PER_QUERY,
      "delta_budget": DELTA_BUDGET,
      "max_groups_contributed": MAX_GROUPS_CONTRIBUTED
    }
  }'
)

Replace the following values:

  • PRIVACY_UNIT_COLUMN: The column that identifies the entity in a dataset that is protected using a privacy analysis rule. This value is a JSON string.
  • MAX_EPSILON_PER_QUERY: Determines the strength of the privacy guarantee per query and prevents the total epsilon from being reached by a single query. This value is a JSON number from 0.001 to 1e+15.
  • EPSILON_BUDGET: The epsilon budget that represents the strength of an overall privacy guarantee. This is used in totality for all differentially private queries on the view. This value must be larger than MAX_EPSILON_PER_QUERY, and is a JSON number from 0.001 to 1e+15.
  • DELTA_PER_QUERY: The maximum probability of privacy loss per query beyond the guarantee determined by total epsilon. Prevents the total delta from being reached by a single query. This value is a JSON number from 1e-15 to 1.
  • DELTA_BUDGET: The delta budget, which represents the maximum probability of overall privacy loss beyond the guarantee determined by total epsilon. This is used in totality for all differentially private queries on the view. This value must be larger than DELTA_PER_QUERY, and is a JSON number from 1e-15 to 1000.
  • MAX_GROUPS_CONTRIBUTED: Optional. Limits the number of groups to which an entity in a privacy unit column can contribute. This value must be a non-negative JSON integer.

In the following example, a differential privacy analysis rule is updated:

ALTER VIEW mydataset.ExamView
SET OPTIONS(
  privacy_policy= '{"differential_privacy_policy": {"privacy_unit_column": "last_name", "max_epsilon_per_query": 0.01, "epsilon_budget": 1000.0, "delta_per_query": 0.005, "delta_budget": 1000.0, "max_groups_contributed": 2}}'
);

-- Epsilon parameters are set very high due to the small dataset.
-- In practice, epsilon should be much smaller.

To review the privacy_policy syntax for ALTER VIEW, see the OPTIONS list in ALTER VIEW SET OPTIONS.

Query a differential privacy analysis rule–enforced view

You can query a view that has a differential privacy analysis rule with the DIFFERENTIAL_PRIVACY clause. To review the syntax and additional examples for the DIFFERENTIAL_PRIVACY clause, see DIFFERENTIAL_PRIVACY clause.

Ensure that the differentially private query runs

Create a differentially private query for the analysis rule–enforced view, and make sure that the query runs.

For example, in the following query, differentially private data is successfully returned from ExamView because epsilon, delta, and max_groups_contributed all satisfy the conditions of the differential analysis rule in ExamView:

-- Create a table called ExamTable.
CREATE TABLE mydataset.ExamTable AS (
  SELECT "Hansen" AS last_name, "P91" AS test_id, 510 AS test_score UNION ALL
  SELECT "Wang", "U25", 500 UNION ALL
  SELECT "Wang", "C83", 520 UNION ALL
  SELECT "Wang", "U25", 460 UNION ALL
  SELECT "Hansen", "C83", 420 UNION ALL
  SELECT "Hansen", "C83", 560 UNION ALL
  SELECT "Devi", "U25", 580 UNION ALL
  SELECT "Devi", "P91", 480 UNION ALL
  SELECT "Ivanov", "U25", 490 UNION ALL
  SELECT "Ivanov", "P91", 540 UNION ALL
  SELECT "Silva", "U25", 550);

-- Create a view that includes ExamTable.
CREATE VIEW mydataset.ExamView
OPTIONS(
  privacy_policy= '{"differential_privacy_policy": {"privacy_unit_column": "last_name", "max_epsilon_per_query": 1000.01, "epsilon_budget": 1000.0, "delta_per_query": 0.01, "delta_budget": 1000.0, "max_groups_contributed": 2}}'
)
AS ( SELECT * FROM mydataset.ExamTable );

After you've created your view, briefly wait, and then when you are certain the view was created, run the following query:

-- Query an analysis–rule enforced view called ExamView.
SELECT
  WITH DIFFERENTIAL_PRIVACY
    OPTIONS(epsilon=10, delta=.01, max_groups_contributed=2, privacy_unit_column=last_name)
    test_id,
    AVG(test_score, contribution_bounds_per_group => (0,100)) AS average_test_score
FROM mydataset.ExamView
GROUP BY test_id;

-- These results will change each time you run the query.
-- Smaller aggregations might be removed.
/*---------+--------------------*
 | test_id | average_test_score |
 +---------+--------------------+
 | P91     | ???                |
 | U25     | ???                |
 *---------+--------------------*/

-- Epsilon parameters are set very high due to the small dataset.
-- In practice, epsilon should be much smaller.

Block a query with an out-of-bounds epsilon

Epsilon can be used to add or remove noise. More epsilon means less noise will be added. If you want to ensure that a differentially private query has a minimal amount of noise, pay close attention to the value for max_epsilon_per_query in your differential privacy analysis rule.

For example, in the following query, the query is blocked with an error because epsilon in the DIFFERENTIAL_PRIVACY clause is higher than max_epsilon_per_query in ExamView:

-- Create a table called ExamTable.
CREATE TABLE mydataset.ExamTable AS (
  SELECT "Hansen" AS last_name, "P91" AS test_id, 510 AS test_score UNION ALL
  SELECT "Wang", "U25", 500 UNION ALL
  SELECT "Wang", "C83", 520 UNION ALL
  SELECT "Wang", "U25", 460 UNION ALL
  SELECT "Hansen", "C83", 420 UNION ALL
  SELECT "Hansen", "C83", 560 UNION ALL
  SELECT "Devi", "U25", 580 UNION ALL
  SELECT "Devi", "P91", 480 UNION ALL
  SELECT "Ivanov", "U25", 490 UNION ALL
  SELECT "Ivanov", "P91", 540 UNION ALL
  SELECT "Silva", "U25", 550);

-- Create a view that includes ExamTable.
CREATE VIEW mydataset.ExamView
OPTIONS(
  privacy_policy= '{"differential_privacy_policy": {"privacy_unit_column": "last_name", "max_epsilon_per_query": 10.01, "epsilon_budget": 1000.0, "delta_per_query": 0.01, "delta_budget": 1000.0, "max_groups_contributed": 2}}'
)
AS ( SELECT * FROM mydataset.ExamTable );

After you've created your view, briefly wait, and then run the following query:

-- Error: epsilon is out of bounds.
SELECT
  WITH DIFFERENTIAL_PRIVACY
    OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=2, privacy_unit_column=last_name)
    test_id,
    AVG(test_score, contribution_bounds_per_group => (0,100)) AS average_test_score
FROM mydataset.ExamView
GROUP BY test_id;

-- Epsilon parameters are set very high due to the small dataset.
-- In practice, epsilon should be much smaller.

Block a query that doesn't have a specific amount of delta

Delta represents the threshold that determines if data might accidentally be leaked. Less delta increase the threshold, more delta decreases the threshold. To ensure that a differentially private query has a specific threshold, update the delta_per_query setting in your differential privacy analysis rule.

For example, in the following query, the query is blocked with an error because delta in the DIFFERENTIAL_PRIVACY clause does not match delta_per_query in ExamView:

-- Create a table called ExamTable.
CREATE TABLE mydataset.ExamTable AS (
  SELECT "Hansen" AS last_name, "P91" AS test_id, 510 AS test_score UNION ALL
  SELECT "Wang", "U25", 500 UNION ALL
  SELECT "Wang", "C83", 520 UNION ALL
  SELECT "Wang", "U25", 460 UNION ALL
  SELECT "Hansen", "C83", 420 UNION ALL
  SELECT "Hansen", "C83", 560 UNION ALL
  SELECT "Devi", "U25", 580 UNION ALL
  SELECT "Devi", "P91", 480 UNION ALL
  SELECT "Ivanov", "U25", 490 UNION ALL
  SELECT "Ivanov", "P91", 540 UNION ALL
  SELECT "Silva", "U25", 550);

-- Create a view that includes ExamTable.
CREATE VIEW mydataset.ExamView
OPTIONS(
  privacy_policy= '{"differential_privacy_policy": {"privacy_unit_column": "last_name", "max_epsilon_per_query": 10.01, "epsilon_budget": 1000.0, "delta_per_query": 0.01, "delta_budget": 1000.0, "max_groups_contributed": 2}}'
)
AS ( SELECT * FROM mydataset.ExamTable );

After you've created your view, briefly wait, and then run the following query:

-- Error: delta in query does not match delta_per_query.
SELECT
  WITH DIFFERENTIAL_PRIVACY
    OPTIONS(epsilon=10, delta=.02, max_groups_contributed=2, privacy_unit_column=last_name)
    test_id,
    AVG(test_score, contribution_bounds_per_group => (0,100)) AS average_test_score
FROM mydataset.ExamView
GROUP BY test_id;

-- Epsilon parameters are set very high due to the small dataset.
-- In practice, epsilon should be much smaller.

Block queries that have exceeded an epsilon budget

Epsilon can be used to add or remove noise. Less epsilon increases noise, more epsilon reduces noise. Even when noise is high, multiple queries over the same data can eventually reveal the un-noised version of the data. To stop this from happening, you can create an epsilon budget. If you want to add an epsilon budget, review the value for epsilon_budget in the differential privacy analysis rule for your view.

Run the following query three times. On the third time, the query is blocked because the total epsilon used is 30, but epsilon_budget in ExamView only allows 25.6:

-- Create a table called ExamTable.
CREATE TABLE mydataset.ExamTable AS (
  SELECT "Hansen" AS last_name, "P91" AS test_id, 510 AS test_score UNION ALL
  SELECT "Wang", "U25", 500 UNION ALL
  SELECT "Wang", "C83", 520 UNION ALL
  SELECT "Wang", "U25", 460 UNION ALL
  SELECT "Hansen", "C83", 420 UNION ALL
  SELECT "Hansen", "C83", 560 UNION ALL
  SELECT "Devi", "U25", 580 UNION ALL
  SELECT "Devi", "P91", 480 UNION ALL
  SELECT "Ivanov", "U25", 490 UNION ALL
  SELECT "Ivanov", "P91", 540 UNION ALL
  SELECT "Silva", "U25", 550);

-- Create a view that includes ExamTable.
CREATE VIEW mydataset.ExamView
OPTIONS(
  privacy_policy= '{"differential_privacy_policy": {"privacy_unit_column": "last_name", "max_epsilon_per_query": 10.01, "epsilon_budget": 25.6, "delta_per_query": 0.01, "delta_budget": 1000.0, "max_groups_contributed": 2}}'
)
AS ( SELECT * FROM mydataset.ExamTable );

After you've created your view, briefly wait, and then run the following query three times:

-- Error after three query runs: epsilon budget exceeded
SELECT
  WITH DIFFERENTIAL_PRIVACY
    OPTIONS(epsilon=10, delta=.01, max_groups_contributed=2, privacy_unit_column=last_name)
    test_id,
    AVG(test_score, contribution_bounds_per_group => (0,100)) AS average_test_score
FROM mydataset.ExamView
GROUP BY test_id;

-- Epsilon parameters are set very high due to the small dataset.
-- In practice, epsilon should be much smaller.

Block queries that have exceeded a delta budget

Delta represents the threshold that determines if data might accidentally be leaked. Less delta increase the threshold, more delta decreases the threshold. Even when the threshold is high, multiple queries over the same data can eventually reveal the differentially private data. To stop this from happening, you can create a delta budget. If you want to add a delta budget, review the value for delta_budget in the differential privacy analysis rule for your view.

Run the following query seven times. On the seventh time, the query is blocked because the total delta used is 0.7, but delta_budget in ExamView only allows 0.6:

-- Create a table called ExamTable.
CREATE TABLE mydataset.ExamTable AS (
  SELECT "Hansen" AS last_name, "P91" AS test_id, 510 AS test_score UNION ALL
  SELECT "Wang", "U25", 500 UNION ALL
  SELECT "Wang", "C83", 520 UNION ALL
  SELECT "Wang", "U25", 460 UNION ALL
  SELECT "Hansen", "C83", 420 UNION ALL
  SELECT "Hansen", "C83", 560 UNION ALL
  SELECT "Devi", "U25", 580 UNION ALL
  SELECT "Devi", "P91", 480 UNION ALL
  SELECT "Ivanov", "U25", 490 UNION ALL
  SELECT "Ivanov", "P91", 540 UNION ALL
  SELECT "Silva", "U25", 550);

-- Create a view that includes ExamTable.
CREATE VIEW mydataset.ExamView
OPTIONS(
  privacy_policy= '{"differential_privacy_policy": {"privacy_unit_column": "last_name", "max_epsilon_per_query": 10.01, "epsilon_budget": 0.2, "delta_per_query": 0.01, "delta_budget": 0.6, "max_groups_contributed": 2}}'
)
AS ( SELECT * FROM mydataset.ExamTable );

After you've created your view, briefly wait, and then run the following query seven times:

-- Error after seven query runs: epsilon budget exceeded
SELECT
  WITH DIFFERENTIAL_PRIVACY
    OPTIONS(epsilon=10, delta=.01, max_groups_contributed=2, privacy_unit_column=last_name)
    test_id,
    AVG(test_score, contribution_bounds_per_group => (0,100)) AS average_test_score
FROM mydataset.ExamView
GROUP BY test_id;

-- Epsilon parameters are set very high due to the small dataset.
-- In practice, epsilon should be much smaller.

Block a query that allows too many group contributions

You can limit the number of groups that each entity can contribute to in a differentially private query. If you want to ensure that a differentially private query has a limited number of groups that each entity can contribute to, pay close attention to the value for max_groups_contributed in your differential privacy analysis rule.

For example, in the following query, the query is blocked with an error because max_groups_contributed in the DIFFERENTIAL_PRIVACY clause is higher than max_groups_contributed in ExamView:

-- Create a table called ExamTable.
CREATE TABLE mydataset.ExamTable AS (
  SELECT "Hansen" AS last_name, "P91" AS test_id, 510 AS test_score UNION ALL
  SELECT "Wang", "U25", 500 UNION ALL
  SELECT "Wang", "C83", 520 UNION ALL
  SELECT "Wang", "U25", 460 UNION ALL
  SELECT "Hansen", "C83", 420 UNION ALL
  SELECT "Hansen", "C83", 560 UNION ALL
  SELECT "Devi", "U25", 580 UNION ALL
  SELECT "Devi", "P91", 480 UNION ALL
  SELECT "Ivanov", "U25", 490 UNION ALL
  SELECT "Ivanov", "P91", 540 UNION ALL
  SELECT "Silva", "U25", 550);

-- Create a view that includes ExamTable.
CREATE VIEW mydataset.ExamView
OPTIONS(
  privacy_policy= '{"differential_privacy_policy": {"privacy_unit_column": "last_name", "max_epsilon_per_query": 10.01, "epsilon_budget": 1000.0, "delta_per_query": 0.01, "delta_budget": 1000.0, "max_groups_contributed": 2}}'
)
AS ( SELECT * FROM mydataset.ExamTable );

After you've created your view, briefly wait, and then run the following query:

-- Error: max_groups_contributed is out of bounds.
SELECT
  WITH DIFFERENTIAL_PRIVACY
    OPTIONS(epsilon=10, delta=.02, max_groups_contributed=3, privacy_unit_column=last_name)
    test_id,
    AVG(test_score, contribution_bounds_per_group => (0,100)) AS average_test_score
FROM mydataset.ExamView
GROUP BY test_id;

-- Epsilon parameters are set very high due to the small dataset.
-- In practice, epsilon should be much smaller.

Join restriction analysis rule

A join restriction analysis rule limits the types of joins that can be used with specific columns in a view. You can use this analysis rule in a view. A view that includes this analysis rule can also include the aggregation threshold analysis rule or differential privacy analysis rule.

Define a join restriction analysis rule for a view

You can define a join restriction analysis rule for a view in a data clean room or with the following statement.

CREATE OR REPLACE VIEW VIEW_NAME
  OPTIONS (
    privacy_policy= '{
      "join_restriction_policy": {
        "join_condition": "JOIN_CONDITION",
        "join_allowed_columns": JOIN_ALLOWED_COLUMNS
      }
    }'
  )
  AS QUERY;

Replace the following values:

  • JOIN_CONDITION: The type of join restriction to enforce on a view. This can be one of the following values:

    • JOIN_NOT_REQUIRED: A join is not required to query this view. If a join is used, only the columns in join_allowed_columns can be used.

    • JOIN_BLOCKED: This view can't be joined along any column. Don't set join_allowed_columns in this case.

    • JOIN_ANY: At least one column in join_allowed_columns must be joined upon for this view to be queried.

    • JOIN_ALL: All columns in join_allowed_columns must be inner joined upon for this view to be queried.

  • JOIN_ALLOWED_COLUMNS: The columns that can be part of a join operation.

  • QUERY: The query for the view.

In the following example, a join restriction analysis rule is created:

-- Create a table called ExamTable.
CREATE TABLE mydataset.ExamTable AS (
  SELECT "Hansen" AS last_name, "P91" AS test_id, 510 AS test_score UNION ALL
  SELECT "Wang", "U25", 500 UNION ALL
  SELECT "Wang", "C83", 520 UNION ALL
  SELECT "Wang", "U25", 460 UNION ALL
  SELECT "Hansen", "C83", 420 UNION ALL
  SELECT "Hansen", "C83", 560 UNION ALL
  SELECT "Devi", "U25", 580 UNION ALL
  SELECT "Devi", "P91", 480 UNION ALL
  SELECT "Ivanov", "U25", 490 UNION ALL
  SELECT "Ivanov", "P91", 540 UNION ALL
  SELECT "Silva", "U25", 550);

-- Create a view that includes ExamTable.
CREATE VIEW mydataset.ExamView
OPTIONS(
  privacy_policy= '{"join_restriction_policy": {"join_condition": "JOIN_ANY", "join_allowed_columns": ["test_id", "test_score"]}}'
)
AS ( SELECT * FROM mydataset.ExamTable );

Use a join restriction analysis rule with another analysis rule

You can use a join restriction analysis rule with either an aggregation threshold analysis rule or a differential privacy analysis rule. However, once you've used a join restriction with another analysis rule on a view, you can't change that other analysis rule.

In the following example, a join restriction analysis rule and aggregation threshold analysis rule are used together:

-- Create a table called ExamTable.
CREATE TABLE mydataset.ExamTable AS (
  SELECT "Hansen" AS last_name, "P91" AS test_id, 510 AS test_score UNION ALL
  SELECT "Wang", "U25", 500 UNION ALL
  SELECT "Wang", "C83", 520 UNION ALL
  SELECT "Wang", "U25", 460 UNION ALL
  SELECT "Hansen", "C83", 420 UNION ALL
  SELECT "Hansen", "C83", 560 UNION ALL
  SELECT "Devi", "U25", 580 UNION ALL
  SELECT "Devi", "P91", 480 UNION ALL
  SELECT "Ivanov", "U25", 490 UNION ALL
  SELECT "Ivanov", "P91", 540 UNION ALL
  SELECT "Silva", "U25", 550);

-- Create a view that includes ExamTable.
CREATE VIEW mydataset.ExamView
OPTIONS(
  privacy_policy= '{"join_restriction_policy": {"join_condition": "JOIN_ANY", "join_allowed_columns": ["test_id", "test_score"]}, "aggregation_threshold_policy": {"threshold": 3, "privacy_unit_column": "last_name"}}'
)
AS ( SELECT * FROM mydataset.ExamTable );

Update a join restriction analysis rule for a view

You can change the join restriction analysis rule for a view with a data clean room or with the following statement:

ALTER VIEW VIEW_NAME
SET OPTIONS (
  privacy_policy= '{
    "join_restriction_policy": {
      "join_condition": "JOIN_CONDITION",
      "join_allowed_columns": JOIN_ALLOWED_COLUMNS
    }
  }'
)

Replace the following values:

  • JOIN_CONDITION: The type of join restriction to enforce on a view. This can be one of the following values:

    • JOIN_NOT_REQUIRED: A join is not required to query this view. If a join is used, only the columns in join_allowed_columns can be used.

    • JOIN_BLOCKED: This view can't be joined along any column. Don't set join_allowed_columns in this case.

    • JOIN_ANY: At least one column in join_allowed_columns must be joined upon for this view to be queried.

    • JOIN_ALL: All columns in join_allowed_columns must be inner joined upon for this view to be queried.

  • JOIN_ALLOWED_COLUMNS: The columns that can be part of a join operation.

  • QUERY: The query for the view.

In the following example, a join restriction analysis rule is updated:

ALTER VIEW mydataset.ExamView
SET OPTIONS(
  privacy_policy= '{"join_restriction_policy": {"join_condition": "JOIN_ALL", "join_allowed_columns": ["test_id", "test_score"]}}'
);

To review the privacy_policy syntax for ALTER VIEW, see the OPTIONS list in ALTER VIEW SET OPTIONS.

Query a join restriction analysis rule–enforced view

You can perform a join operation on a view that has a join restriction analysis rule. To review the syntax for the JOIN operation, see Join operation.

Ensure that the join restrictive query runs

You should test to make sure that your join restrictive query runs.

For example, in the following query, joined data is successfully returned from ExamView and StudentTable:

-- Create a table called ExamTable.
CREATE TABLE mydataset.ExamTable AS (
  SELECT "Hansen" AS last_name, "P91" AS test_id, 510 AS test_score UNION ALL
  SELECT "Wang", "U25", 500 UNION ALL
  SELECT "Wang", "C83", 520 UNION ALL
  SELECT "Wang", "U25", 460 UNION ALL
  SELECT "Hansen", "C83", 420 UNION ALL
  SELECT "Hansen", "C83", 560 UNION ALL
  SELECT "Devi", "U25", 580 UNION ALL
  SELECT "Devi", "P91", 480 UNION ALL
  SELECT "Ivanov", "U25", 490 UNION ALL
  SELECT "Ivanov", "P91", 540 UNION ALL
  SELECT "Silva", "U25", 550);

-- Create a table called StudentTable.
CREATE TABLE mydataset.StudentTable AS (
  SELECT "Hansen" AS last_name, 510 AS test_score UNION ALL
  SELECT "Wang", 500 UNION ALL
  SELECT "Devi", 580 UNION ALL
  SELECT "Ivanov", 490 UNION ALL
  SELECT "Silva", 550);

-- Create a view that includes ExamTable.
CREATE VIEW mydataset.ExamView
OPTIONS(
  privacy_policy= '{"join_restriction_policy": {"join_condition": "JOIN_ANY", "join_allowed_columns": ["test_score"]}}'
)
AS ( SELECT * FROM mydataset.ExamTable );

-- Query the ExamView view.
SELECT *
FROM mydataset.ExamView INNER JOIN mydataset.StudentTable USING (test_score)
GROUP BY test_id;

-- These results will change each time you run the query.
-- Smaller aggregations might be removed.
/*---------+--------------------*
 | test_id | average_test_score |
 +---------+--------------------+
 | P91     | ???                |
 | U25     | ???                |
 *---------+--------------------*/

Block a join operation with no required column

You can block a join operation if it doesn't include at least one required column. To do this include the following parts in your join restriction analysis rule:

"join_restriction_policy": {
  "join_condition": "JOIN_ANY",
  "join_allowed_columns": ["column_name", ...]
}

For example, in the following query, the query is blocked with an error because the query does not contain any join operations on the test_score or test_id column in ExamView and StudentTable:

-- Create a table called ExamTable.
CREATE TABLE mydataset.ExamTable AS (
  SELECT "Hansen" AS last_name, "P91" AS test_id, 510 AS test_score UNION ALL
  SELECT "Wang", "U25", 500 UNION ALL
  SELECT "Wang", "C83", 520 UNION ALL
  SELECT "Wang", "U25", 460 UNION ALL
  SELECT "Hansen", "C83", 420 UNION ALL
  SELECT "Hansen", "C83", 560 UNION ALL
  SELECT "Devi", "U25", 580 UNION ALL
  SELECT "Devi", "P91", 480 UNION ALL
  SELECT "Ivanov", "U25", 490 UNION ALL
  SELECT "Ivanov", "P91", 540 UNION ALL
  SELECT "Silva", "U25", 550);

-- Create a table called StudentTable.
CREATE TABLE mydataset.StudentTable AS (
  SELECT "Hansen" AS last_name, 510 AS test_score UNION ALL
  SELECT "Wang", 500 UNION ALL
  SELECT "Devi", 580 UNION ALL
  SELECT "Ivanov", 490 UNION ALL
  SELECT "Silva", 550);

-- Create a view that includes ExamTable.
CREATE VIEW mydataset.ExamView
OPTIONS(
  privacy_policy= '{"join_restriction_policy": {"join_condition": "JOIN_ANY", "join_allowed_columns": ["test_score", "test_id"]}}'
)
AS ( SELECT * FROM mydataset.ExamTable );

-- Query the ExamView view.
SELECT *
FROM mydataset.ExamView INNER JOIN mydataset.StudentTable USING (last_name)
GROUP BY test_id;

To get the preceding query to run, in the USING clause, replace last_name with test_score.

Block a query with no join operation

If the query must have a join operation, you can block the query if no join operation is present by using one of the following join restriction analysis rules:

"join_restriction_policy": {
  "join_condition": "JOIN_NOT_REQUIRED"
}
"join_restriction_policy": {
  "join_condition": "JOIN_NOT_REQUIRED",
  "join_allowed_columns": []
}

For example, in the following query, the query is blocked because there is no join operation in the query:

-- Create a table called ExamTable.
CREATE TABLE mydataset.ExamTable AS (
  SELECT "Hansen" AS last_name, "P91" AS test_id, 510 AS test_score UNION ALL
  SELECT "Wang", "U25", 500 UNION ALL
  SELECT "Wang", "C83", 520 UNION ALL
  SELECT "Wang", "U25", 460 UNION ALL
  SELECT "Hansen", "C83", 420 UNION ALL
  SELECT "Hansen", "C83", 560 UNION ALL
  SELECT "Devi", "U25", 580 UNION ALL
  SELECT "Devi", "P91", 480 UNION ALL
  SELECT "Ivanov", "U25", 490 UNION ALL
  SELECT "Ivanov", "P91", 540 UNION ALL
  SELECT "Silva", "U25", 550);

-- Create a table called StudentTable.
CREATE TABLE mydataset.StudentTable AS (
  SELECT "Hansen" AS last_name, 510 AS test_score UNION ALL
  SELECT "Wang", 500 UNION ALL
  SELECT "Devi", 580 UNION ALL
  SELECT "Ivanov", 490 UNION ALL
  SELECT "Silva", 550);

-- Create a view that includes ExamTable.
CREATE VIEW mydataset.ExamView
OPTIONS(
  privacy_policy= '{"join_restriction_policy": {"join_condition": "JOIN_NOT_REQUIRED"}}'
)
AS ( SELECT * FROM mydataset.ExamTable );

-- Query the ExamView view.
SELECT *
FROM mydataset.ExamView
GROUP BY test_id;

Block a query with no join operation and no required column

If the query must have a join operation and the join operation must have at least one required column, include the following parts in your join restriction analysis rule:

"join_restriction_policy": {
  "join_condition": "JOIN_NOT_REQUIRED",
  "join_allowed_columns": ["column_name", ...]
}

For example, in the following query, the query is blocked because the join operation does not include a column in the join_allowed_columns array:

-- Create a table called ExamTable.
CREATE TABLE mydataset.ExamTable AS (
  SELECT "Hansen" AS last_name, "P91" AS test_id, 510 AS test_score UNION ALL
  SELECT "Wang", "U25", 500 UNION ALL
  SELECT "Wang", "C83", 520 UNION ALL
  SELECT "Wang", "U25", 460 UNION ALL
  SELECT "Hansen", "C83", 420 UNION ALL
  SELECT "Hansen", "C83", 560 UNION ALL
  SELECT "Devi", "U25", 580 UNION ALL
  SELECT "Devi", "P91", 480 UNION ALL
  SELECT "Ivanov", "U25", 490 UNION ALL
  SELECT "Ivanov", "P91", 540 UNION ALL
  SELECT "Silva", "U25", 550);

-- Create a table called StudentTable.
CREATE TABLE mydataset.StudentTable AS (
  SELECT "Hansen" AS last_name, 510 AS test_score UNION ALL
  SELECT "Wang", 500 UNION ALL
  SELECT "Devi", 580 UNION ALL
  SELECT "Ivanov", 490 UNION ALL
  SELECT "Silva", 550);

-- Create a view that includes ExamTable.
CREATE VIEW mydataset.ExamView
OPTIONS(
  privacy_policy= '{"join_restriction_policy": {"join_condition": "JOIN_NOT_REQUIRED", "join_allowed_columns": ["test_score"]}}'
)
AS ( SELECT * FROM mydataset.ExamTable );

-- Query the ExamView view.
SELECT *
FROM mydataset.ExamView INNER JOIN mydataset.StudentTable USING (last_name)
GROUP BY test_id;

To get the preceding query to run, in the USING clause, replace last_name with test_score.

Block all join operations

You can block all join operations. To do this, only include the following parts in your join restriction analysis rule:

"join_restriction_policy": {
  "join_condition": "JOIN_BLOCKED",
}

For example, in the following query, the query is blocked because there is a join operation:

-- Create a table called ExamTable.
CREATE TABLE mydataset.ExamTable AS (
  SELECT "Hansen" AS last_name, "P91" AS test_id, 510 AS test_score UNION ALL
  SELECT "Wang", "U25", 500 UNION ALL
  SELECT "Wang", "C83", 520 UNION ALL
  SELECT "Wang", "U25", 460 UNION ALL
  SELECT "Hansen", "C83", 420 UNION ALL
  SELECT "Hansen", "C83", 560 UNION ALL
  SELECT "Devi", "U25", 580 UNION ALL
  SELECT "Devi", "P91", 480 UNION ALL
  SELECT "Ivanov", "U25", 490 UNION ALL
  SELECT "Ivanov", "P91", 540 UNION ALL
  SELECT "Silva", "U25", 550);

-- Create a table called StudentTable.
CREATE TABLE mydataset.StudentTable AS (
  SELECT "Hansen" AS last_name, 510 AS test_score UNION ALL
  SELECT "Wang", 500 UNION ALL
  SELECT "Devi", 580 UNION ALL
  SELECT "Ivanov", 490 UNION ALL
  SELECT "Silva", 550);

-- Create a view that includes ExamTable.
CREATE VIEW mydataset.ExamView
OPTIONS(
  privacy_policy= '{"join_restriction_policy": {"join_condition": "JOIN_BLOCKED"}}'
)
AS ( SELECT * FROM mydataset.ExamTable );

-- Query the ExamView view.
SELECT *
FROM mydataset.ExamView INNER JOIN mydataset.StudentTable USING (last_name)
GROUP BY test_id;

To get the preceding query to run, remove the INNER JOIN operation.

Block an inner join operation without all required columns

You can block an inner join operation if it doesn't include all required column. To do this include the following parts in your join restriction analysis rule:

"join_restriction_policy": {
  "join_condition": "JOIN_ALL",
  "join_allowed_columns": ["column_name", ...]
}

For example, in the following query, the query is blocked with an error because the query does not includetest_score in the join operation:

-- Create a table called ExamTable.
CREATE TABLE mydataset.ExamTable AS (
  SELECT "Hansen" AS last_name, "P91" AS test_id, 510 AS test_score UNION ALL
  SELECT "Wang", "U25", 500 UNION ALL
  SELECT "Wang", "C83", 520 UNION ALL
  SELECT "Wang", "U25", 460 UNION ALL
  SELECT "Hansen", "C83", 420 UNION ALL
  SELECT "Hansen", "C83", 560 UNION ALL
  SELECT "Devi", "U25", 580 UNION ALL
  SELECT "Devi", "P91", 480 UNION ALL
  SELECT "Ivanov", "U25", 490 UNION ALL
  SELECT "Ivanov", "P91", 540 UNION ALL
  SELECT "Silva", "U25", 550);

-- Create a table called StudentTable.
CREATE TABLE mydataset.StudentTable AS (
  SELECT "Hansen" AS last_name, 510 AS test_score UNION ALL
  SELECT "Wang", 500 UNION ALL
  SELECT "Devi", 580 UNION ALL
  SELECT "Ivanov", 490 UNION ALL
  SELECT "Silva", 550);

-- Create a view that includes ExamTable.
CREATE VIEW mydataset.ExamView
OPTIONS(
  privacy_policy= '{"join_restriction_policy": {"join_condition": "JOIN_ALL", "join_allowed_columns": ["test_score", "last_name"]}}'
)
AS ( SELECT * FROM mydataset.ExamTable );

-- Query the ExamView view.
SELECT *
FROM mydataset.ExamView INNER JOIN mydataset.StudentTable USING (last_name)
GROUP BY test_id;

To get the preceding query to run, replace USING (last_name) with USING (last_name, test_score).

List overlap analysis rule

Limits the types of joins that can be used with specific columns. Joins need to be present in queries and certain columns can't be blocked. You can define and update a list overlap analysis rule for a view in a data clean room. For more information, see Share sensitive data with data clean rooms.

Limitations

Analysis rules have the following limitations:

  • If you've already added an analysis rule to a view, you can't switch between aggregation threshold analysis rules and differential analysis rules.

An aggregation threshold analysis rule has the following limitations:

  • You can only use supported aggregate functions in a query on an aggregation threshold analysis rule–enforced view.
  • You can't add an aggregation threshold analysis rule to a materialized view.
  • If you use an aggregation threshold analysis rule–enforced view in an aggregation threshold query, they must both have the same value for the privacy unit column.
  • If you use an aggregation threshold analysis rule–enforced view in an aggregation threshold query, the threshold in the query must be greater than or equal to the threshold in the view.
  • Time travel is disabled on any view that has an aggregation threshold analysis rule.

A differential privacy analysis rule has the following limitations:

  • Once a privacy budget is exhausted for a view, that view can't be used and you must create a new view.

A join restriction analysis rule has the following limitations:

  • If you don't place the privacy_unit_column as a join_allowed_column in a join restriction analysis rule, you might not be able to join any columns in certain situations.

Pricing