Apache Hive SQL translation guide

This document details the similarities and differences in SQL syntax between Apache Hive and BigQuery to help you plan your migration. To migrate your SQL scripts in bulk, use batch SQL translation. To translate ad hoc queries, use interactive SQL translation.

In some cases, there's no direct mapping between a SQL element in Hive and BigQuery. However, in most cases, BigQuery offers an alternative element to Hive to help you achieve the same functionality, as shown in the examples in this document.

The intended audience for this document is enterprise architects, database administrators, application developers, and IT security specialists. It assumes that you're familiar with Hive.

Data types

Hive and BigQuery have different data type systems. In most cases, you can map data types in Hive to BigQuery data types with a few exceptions, such as MAP and UNION. Hive supports more implicit type casting than BigQuery. As a result, the batch SQL translator inserts many explicit casts.

Hive BigQuery
TINYINT INT64
SMALLINT INT64
INT INT64
BIGINT INT64
DECIMAL NUMERIC
FLOAT FLOAT64
DOUBLE FLOAT64
BOOLEAN BOOL
STRING STRING
VARCHAR STRING
CHAR STRING
BINARY BYTES
DATE DATE
- DATETIME
- TIME
TIMESTAMP DATETIME/TIMESTAMP
INTERVAL -
ARRAY ARRAY
STRUCT STRUCT
MAPS STRUCT with key values (REPEAT field)
UNION STRUCT with different types
- GEOGRAPHY
- JSON

Query syntax

This section addresses differences in query syntax between Hive and BigQuery.

SELECT statement

Most Hive SELECT statements are compatible with BigQuery. The following table contains a list of minor differences:

Case Hive BigQuery
Subquery

SELECT * FROM (
SELECT 10 as col1, "test" as col2, "test" as col3
) tmp_table;

SELECT * FROM (
SELECT 10 as col1, "test" as col2, "test" as col3
);

Column filtering

SET hive.support.quoted.identifiers=none;
SELECT `(col2|col3)?+.+` FROM (
SELECT 10 as col1, "test" as col2, "test" as col3
) tmp_table;

SELECT * EXCEPT(col2,col3) FROM (
SELECT 10 as col1, "test" as col2, "test" as col3
);

Exploding an array

SELECT tmp_table.pageid, adid FROM (
SELECT 'test_value' pageid, Array(1,2,3) ad_id) tmp_table
LATERAL VIEW
explode(tmp_table.ad_id) adTable AS adid;

SELECT tmp_table.pageid, ad_id FROM (
SELECT 'test_value' pageid, [1,2,3] ad_id) tmp_table,
UNNEST(tmp_table.ad_id) ad_id;

FROM clause

The FROM clause in a query lists the table references from which data is selected. In Hive, possible table references include tables, views, and subqueries. BigQuery also supports all these table references.

You can reference BigQuery tables in the FROM clause by using the following:

  • [project_id].[dataset_id].[table_name]
  • [dataset_id].[table_name]
  • [table_name]

BigQuery also supports additional table references:

Comparison operators

The following table provides details about converting operators from Hive to BigQuery:

Function or operator Hive BigQuery
- Unary minus
* Multiplication
/ Division
+ Addition
- Subtraction
All number types All number types.

To prevent errors during the divide operation, consider using SAFE_DIVIDE or IEEE_DIVIDE.

~ Bitwise not
| Bitwise OR
& Bitwise AND
^ Bitwise XOR
Boolean data type Boolean data type.
Left shift

shiftleft(TINYINT|SMALLINT|INT a, INT b)
shiftleft(BIGINT a, INT b)

<< Integer or bytes

A << B, where B must be same type as A

Right shift

shiftright(TINYINT|SMALLINT|INT a, INT b)
shiftright(BIGINT a, INT b)

>> Integer or bytes

A >> B, where B must be same type as A

Modulus (remainder) X % Y

All number types

MOD(X, Y)
Integer division A DIV B and A/B for detailed precision All number types.

Note: To prevent errors during the divide operation, consider using SAFE_DIVIDE or IEEE_DIVIDE.

Unary negation !, NOT NOT
Types supporting equality comparisons All primitive types All comparable types and STRUCT.
a <=> b Not supported. Translate to the following:

(a = b AND b IS NOT NULL OR a IS NULL)

a <> b Not supported. Translate to the following:

NOT (a = b AND b IS NOT NULL OR a IS NULL)

Relational operators ( =, ==, !=, <, >, >= ) All primitive types All comparable types.
String comparison RLIKE, REGEXP REGEXP_CONTAINS built-in function. Uses BigQuery regex syntax for string functions for the regular expression patterns.
[NOT] LIKE, [NOT] BETWEEN, IS [NOT] NULL A [NOT] BETWEEN B AND C, A IS [NOT] (TRUE|FALSE), A [NOT] LIKE B Same as Hive. In addition, BigQuery also supports the IN operator.

JOIN conditions

Both Hive and BigQuery support the following types of joins:

  • [INNER] JOIN

  • LEFT [OUTER] JOIN

  • RIGHT [OUTER] JOIN

  • FULL [OUTER] JOIN

  • CROSS JOIN and the equivalent implicit comma cross join

For more information, see Join operation and Hive joins.

Type conversion and casting

The following table provides details about converting functions from Hive to BigQuery:

Function or operator Hive BigQuery
Type casting When a cast fails, `NULL` is returned.

Same syntax as Hive. For more information about BigQuery type conversion rules, see Conversion rules.

If cast fails, you see an error. To have the same behavior as Hive, use SAFE_CAST instead.

SAFE function calls If you prefix function calls with SAFE, the function returns NULL instead of reporting failure. For example, SAFE.SUBSTR('foo', 0, -2) AS safe_output; returns NULL.

Note: When casting safely without errors, use SAFE_CAST.

Implicit conversion types

When migrating to BigQuery, you need to convert most of your Hive implicit conversions to BigQuery explicit conversions except for the following data types, which BigQuery implicitly converts.

From BigQuery type To BigQuery type
INT64 FLOAT64, NUMERIC, BIGNUMERIC
BIGNUMERIC FLOAT64
NUMERIC BIGNUMERIC, FLOAT64

BigQuery also performs implicit conversions for the following literals:

From BigQuery type To BigQuery type
STRING literal (for example, "2008-12-25") DATE
STRING literal (for example, "2008-12-25 15:30:00") TIMESTAMP
STRING literal (for example, "2008-12-25T07:30:00") DATETIME
STRING literal (for example, "15:30:00") TIME

Explicit conversion types

If you want to convert Hive data types that BigQuery doesn't implicitly convert, use the BigQuery CAST(expression AS type) function or either of the DATE and TIMESTAMP conversion functions.

Functions

This section covers common functions used in Hive and BigQuery.

Aggregate functions

The following table shows mappings between common Hive aggregate, statistical aggregate, and approximate aggregate functions with their BigQuery equivalents:

Hive BigQuery
count(DISTINCT expr[, expr...]) count(DISTINCT expr[, expr...])
percentile_approx(DOUBLE col, array(p1 [, p2]...) [, B]) WITHIN GROUP (ORDER BY expression) APPROX_QUANTILES(expression, 100)[OFFSET(CAST(TRUNC(percentile * 100) as INT64))]

BigQuery doesn't support the rest of the arguments that Hive defines.

AVG AVG
X | Y BIT_OR / X | Y
X ^ Y BIT_XOR / X ^ Y
X & Y BIT_AND / X & Y
COUNT COUNT
COLLECT_SET(col), \ COLLECT_LIST(col) ARRAY_AGG(col)
COUNT COUNT
MAX MAX
MIN MIN
REGR_AVGX AVG(

IF(dep_var_expr is NULL

OR ind_var_expr is NULL,

NULL, ind_var_expr)

)

REGR_AVGY AVG(

IF(dep_var_expr is NULL

OR ind_var_expr is NULL,

NULL, dep_var_expr)

)

REGR_COUNT SUM(

IF(dep_var_expr is NULL

OR ind_var_expr is NULL,

NULL, 1)

)

REGR_INTERCEPT AVG(dep_var_expr)

- AVG(ind_var_expr)

* (COVAR_SAMP(ind_var_expr,dep_var_expr)

/ VARIANCE(ind_var_expr)

)

REGR_R2 (COUNT(dep_var_expr) *

SUM(ind_var_expr * dep_var_expr) -

SUM(dep_var_expr) * SUM(ind_var_expr))

/ SQRT(

(COUNT(ind_var_expr) *

SUM(POWER(ind_var_expr, 2)) *

POWER(SUM(ind_var_expr),2)) *

(COUNT(dep_var_expr) *

SUM(POWER(dep_var_expr, 2)) *

POWER(SUM(dep_var_expr), 2)))

REGR_SLOPE COVAR_SAMP(ind_var_expr,

dep_var_expr)

/ VARIANCE(ind_var_expr)

REGR_SXX SUM(POWER(ind_var_expr, 2)) - COUNT(ind_var_expr) * POWER(AVG(ind_var_expr),2)
REGR_SXY SUM(ind_var_expr*dep_var_expr) - COUNT(ind_var_expr) * AVG(ind) * AVG(dep_var_expr)
REGR_SYY SUM(POWER(dep_var_expr, 2)) - COUNT(dep_var_expr) * POWER(AVG(dep_var_expr),2)
ROLLUP ROLLUP
STDDEV_POP STDDEV_POP
STDDEV_SAMP STDDEV_SAMP, STDDEV
SUM SUM
VAR_POP VAR_POP
VAR_SAMP VAR_SAMP, VARIANCE
CONCAT_WS STRING_AGG

Analytical functions

The following table shows mappings between common Hive analytical functions with their BigQuery equivalents:

Hive BigQuery
AVG AVG
COUNT COUNT
COVAR_POP COVAR_POP
COVAR_SAMP COVAR_SAMP
CUME_DIST CUME_DIST
DENSE_RANK DENSE_RANK
FIRST_VALUE FIRST_VALUE
LAST_VALUE LAST_VALUE
LAG LAG
LEAD LEAD
COLLECT_LIST, \ COLLECT_SET ARRAY_AGG ARRAY_CONCAT_AGG
MAX MAX
MIN MIN
NTILE NTILE(constant_integer_expression)
PERCENT_RANK PERCENT_RANK
RANK () RANK
ROW_NUMBER ROW_NUMBER
STDDEV_POP STDDEV_POP
STDDEV_SAMP STDDEV_SAMP, STDDEV
SUM SUM
VAR_POP VAR_POP
VAR_SAMP VAR_SAMP, VARIANCE
VARIANCE VARIANCE ()
WIDTH_BUCKET A user-defined function (UDF) can be used.

Date and time functions

The following table shows mappings between common Hive date and time functions and their BigQuery equivalents:

DATE_ADD DATE_ADD(date_expression, INTERVAL int64_expression date_part)
DATE_SUB DATE_SUB(date_expression, INTERVAL int64_expression date_part)
CURRENT_DATE CURRENT_DATE
CURRENT_TIME CURRENT_TIME
CURRENT_TIMESTAMP CURRENT_DATETIME is recommended, as this value is timezone-free and synonymous with CURRENT_TIMESTAMP \ CURRENT_TIMESTAMP in Hive.
EXTRACT(field FROM source) EXTRACT(part FROM datetime_expression)
LAST_DAY DATE_SUB( DATE_TRUNC( DATE_ADD(

date_expression, INTERVAL 1 MONTH

), MONTH ), INTERVAL 1 DAY)

MONTHS_BETWEEN DATE_DIFF(date_expression, date_expression, MONTH)
NEXT_DAY DATE_ADD(

DATE_TRUNC(

date_expression,

WEEK(day_value)

),

INTERVAL 1 WEEK

)

TO_DATE PARSE_DATE
FROM_UNIXTIME UNIX_SECONDS
FROM_UNIXTIMESTAMP FORMAT_TIMESTAMP
YEAR \ QUARTER \ MONTH \ HOUR \ MINUTE \ SECOND \ WEEKOFYEAR EXTRACT
DATEDIFF DATE_DIFF

BigQuery offers the following additional date and time functions:

String functions

The following table shows mappings between Hive string functions and their BigQuery equivalents:

Hive BigQuery
ASCII TO_CODE_POINTS(string_expr)[OFFSET(0)]
HEX TO_HEX
LENGTH CHAR_LENGTH
LENGTH CHARACTER_LENGTH
CHR CODE_POINTS_TO_STRING
CONCAT CONCAT
LOWER LOWER
LPAD LPAD
LTRIM LTRIM
REGEXP_EXTRACT REGEXP_EXTRACT
REGEXP_REPLACE REGEXP_REPLACE
REPLACE REPLACE
REVERSE REVERSE
RPAD RPAD
RTRIM RTRIM
SOUNDEX SOUNDEX
SPLIT SPLIT(instring, delimiter)[ORDINAL(tokennum)]
SUBSTR, \ SUBSTRING SUBSTR
TRANSLATE TRANSLATE
LTRIM LTRIM
RTRIM RTRIM
TRIM TRIM
UPPER UPPER

BigQuery offers the following additional string functions:

Math functions

The following table shows mappings between Hive math functions and their BigQuery equivalents:

Hive BigQuery
ABS ABS
ACOS ACOS
ASIN ASIN
ATAN ATAN
CEIL CEIL
CEILING CEILING
COS COS
FLOOR FLOOR
GREATEST GREATEST
LEAST LEAST
LN LN
LNNVL Use with ISNULL.
LOG LOG
MOD (% operator) MOD
POWER POWER, POW
RAND RAND
ROUND ROUND
SIGN SIGN
SIN SIN
SQRT SQRT
HASH FARM_FINGERPRINT, MD5, SHA1, SHA256, SHA512
STDDEV_POP STDDEV_POP
STDDEV_SAMP STDDEV_SAMP
TAN TAN
TRUNC TRUNC
NVL IFNULL(expr, 0), COALESCE(exp, 0)

BigQuery offers the following additional math functions:

Logical and conditional functions

The following table shows mappings between Hive logical and conditional functions and their BigQuery equivalents:

Hive BigQuery
CASE CASE
COALESCE COALESCE
NVL IFNULL(expr, 0), COALESCE(exp, 0)
NULLIF NULLIF
IF IF(expr, true_result, else_result)
ISNULL IS NULL
ISNOTNULL IS NOT NULL
NULLIF NULLIF

UDFs and UDAFs

BigQuery supportsUDFs but not user-defined aggregate functions (UDAFs).

DML syntax

This section addresses differences in data manipulation language (DML) syntax between Hive and BigQuery.

INSERT statement

Most Hive INSERT statements are compatible with BigQuery. The following table shows exceptions:

Hive BigQuery
INSERT INTO TABLE tablename [PARTITION (partcol1[=val1], partcol2[=val2] ...)] VALUES values_row [, values_row ...] INSERT INTO table (...) VALUES (...);

Note: In BigQuery, omitting column names in the INSERT statement only works if values for all columns in the target table are included in ascending order based on their ordinal positions.

INSERT OVERWRITE [LOCAL] DIRECTORY directory1

[ROW FORMAT row_format] [STORED AS file_format] (Note: Only available starting with Hive 0.11.0)

SELECT ... FROM ...

BigQuery doesn't support the insert-overwrite operations. This Hive syntax can be migrated to TRUNCATE and INSERT statements.

BigQuery imposes DML quotas that restrict the number of DML statements that you can execute daily. To make the best use of your quota, consider the following approaches:

  • Combine multiple rows in a single INSERT statement, instead of one row for each INSERT operation.

  • Combine multiple DML statements (including INSERT) by using a MERGE statement.

  • Use CREATE TABLE ... AS SELECT to create and populate new tables.

UPDATE statement

Most Hive UPDATE statements are compatible with BigQuery. The following table shows exceptions:

Hive BigQuery
UPDATE tablename SET column = value [, column = value ...] [WHERE expression] UPDATE table

SET column = expression [,...]

[FROM ...]

WHERE TRUE

Note: All UPDATE statements in BigQuery require a WHERE keyword, followed by a condition.

DELETE and TRUNCATE statements

You can use DELETE or TRUNCATE statements to remove rows from a table without affecting the table schema or indexes.

In BigQuery, the DELETE statement must have a WHERE clause. For more information about DELETE in BigQuery, see DELETE examples.

Hive BigQuery
DELETE FROM tablename [WHERE expression] DELETE FROM table_name WHERE TRUE

BigQuery DELETE statements require a WHERE clause.

TRUNCATE [TABLE] table_name [PARTITION partition_spec]; TRUNCATE TABLE [[project_name.]dataset_name.]table_name

MERGE statement

The MERGE statement can combine INSERT, UPDATE, and DELETE operations into a single upsert statement and perform the operations. The MERGE operation must match one source row at most for each target row.

Hive BigQuery
MERGE INTO AS T USING AS S ON

WHEN MATCHED [AND ] THEN UPDATE SET

WHEN MATCHED [AND ] THEN DELETE

WHEN NOT MATCHED [AND ] THEN INSERT VALUES

MERGE target USING source

ON target.key = source.key

WHEN MATCHED AND source.filter = 'filter_exp' THEN

UPDATE SET

target.col1 = source.col1,

target.col2 = source.col2,

...

Note: You must list all columns that need to be updated.

ALTER statement

The following table provides details about converting CREATE VIEW statements from Hive to BigQuery:

Function Hive BigQuery
Rename table ALTER TABLE table_name RENAME TO new_table_name; Not supported. A workaround is to use a copy job with the name that you want as the destination table, and then delete the old one.

bq copy project.dataset.old_table project.dataset.new_table

bq rm --table project.dataset.old_table

Table properties ALTER TABLE table_name SET TBLPROPERTIES table_properties;

table_properties:

: (property_name = property_value, property_name = property_value, ... )

Table Comment: ALTER TABLE table_name SET TBLPROPERTIES ('comment' = new_comment);

{ALTER TABLE | ALTER TABLE IF EXISTS}

table_name

SET OPTIONS(table_set_options_list)

SerDe properties (Serialize and deserialize) ALTER TABLE table_name [PARTITION partition_spec] SET SERDE serde_class_name [WITH SERDEPROPERTIES serde_properties];

ALTER TABLE table_name [PARTITION partition_spec] SET SERDEPROPERTIES serde_properties;

serde_properties:

: (property_name = property_value, property_name = property_value, ... )

Serialization and deserialization is managed by the BigQuery service and isn't user configurable.

To learn how to let BigQuery read data from CSV, JSON, AVRO, PARQUET, or ORC files, see Create Cloud Storage external tables.

Supports CSV, JSON, AVRO, and PARQUET export formats. For more information, see Export formats and compression types.

Table storage properties ALTER TABLE table_name CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name, ...)] INTO num_buckets BUCKETS; Not supported for the ALTER statements.
Skewed table Skewed: ALTER TABLE table_name SKEWED BY (col_name1, col_name2, ...) ON ([(col_name1_value, col_name2_value, ...) [, (col_name1_value, col_name2_value), ...]

[STORED AS DIRECTORIES];

Not Skewed: ALTER TABLE table_name NOT SKEWED;

Not Stored as Directories: ALTER TABLE table_name NOT STORED AS DIRECTORIES;

Skewed Location: ALTER TABLE table_name SET SKEWED LOCATION (col_name1="location1" [, col_name2="location2", ...] );

Balancing storage for performance queries is managed by the BigQuery service and isn't configurable.
Table constraints ALTER TABLE table_name ADD CONSTRAINT constraint_name PRIMARY KEY (column, ...) DISABLE NOVALIDATE; ALTER TABLE table_name ADD CONSTRAINT constraint_name FOREIGN KEY (column, ...) REFERENCES table_name(column, ...) DISABLE NOVALIDATE RELY;

ALTER TABLE table_name DROP CONSTRAINT constraint_name;

ALTER TABLE [[project_name.]dataset_name.]table_name
ADD [CONSTRAINT [IF NOT EXISTS] [constraint_name]] constraint NOT ENFORCED;
ALTER TABLE [[project_name.]dataset_name.]table_name
ADD PRIMARY KEY(column_list) NOT ENFORCED;

For more information, see ALTER TABLE ADD PRIMARY KEY statement.

Add partition ALTER TABLE table_name ADD [IF NOT EXISTS] PARTITION partition_spec [LOCATION 'location'][, PARTITION partition_spec [LOCATION 'location'], ...];

partition_spec:

: (partition_column = partition_col_value, partition_column = partition_col_value, ...)

Not supported. Additional partitions are added as needed when data with new values in the partition columns are loaded.

For more information, see Managing partitioned tables.

Rename partition ALTER TABLE table_name PARTITION partition_spec RENAME TO PARTITION partition_spec; Not supported.
Exchange partition -- Move partition from table_name_1 to table_name_2

ALTER TABLE table_name_2 EXCHANGE PARTITION (partition_spec) WITH TABLE table_name_1; -- multiple partitions

ALTER TABLE table_name_2 EXCHANGE PARTITION (partition_spec, partition_spec2, ...) WITH TABLE table_name_1;

Not supported.
Recover partition MSCK [REPAIR] TABLE table_name [ADD/DROP/SYNC PARTITIONS]; Not supported.
Drop partition ALTER TABLE table_name DROP [IF EXISTS] PARTITION partition_spec[, PARTITION partition_spec, ...] [IGNORE PROTECTION] [PURGE]; Supported using the following methods:
  • bq rm 'mydataset.table_name$partition_id'
  • DELETE from table_name$partition_id WHERE 1=1

  • For more information, see Delete a partition.

(Un)Archive partition ALTER TABLE table_name ARCHIVE PARTITION partition_spec; ALTER TABLE table_name UNARCHIVE PARTITION partition_spec; Not supported.
Table and partition file format ALTER TABLE table_name [PARTITION partition_spec] SET FILEFORMAT file_format; Not supported.
Table and partition location ALTER TABLE table_name [PARTITION partition_spec] SET LOCATION "new location"; Not supported.
Table and partition touch ALTER TABLE table_name TOUCH [PARTITION partition_spec]; Not supported.
Table and partition protection ALTER TABLE table_name [PARTITION partition_spec] ENABLE|DISABLE NO_DROP [CASCADE];

ALTER TABLE table_name [PARTITION partition_spec] ENABLE|DISABLE OFFLINE;

Not supported.
Table and partition compact ALTER TABLE table_name [PARTITION (partition_key = 'partition_value' [, ...])] COMPACT 'compaction_type'[AND WAIT]

[WITH OVERWRITE TBLPROPERTIES ("property"="value" [, ...])];

Not supported.
Table and artition concatenate ALTER TABLE table_name [PARTITION (partition_key = 'partition_value' [, ...])] CONCATENATE; Not supported.
Table and partition columns ALTER TABLE table_name [PARTITION (partition_key = 'partition_value' [, ...])] UPDATE COLUMNS; Not supported for the ALTER TABLE statements.
Column name, type, position, and comment ALTER TABLE table_name [PARTITION partition_spec] CHANGE [COLUMN] col_old_name col_new_name column_type [COMMENT col_comment] [FIRST|AFTER column_name] [CASCADE|RESTRICT]; Not supported.

DDL syntax

This section addresses differences in Data Definition Language (DDL) syntax between Hive and BigQuery.

CREATE TABLE and DROP TABLE statements

The following table provides details about converting CREATE TABLE statements from Hive to BigQuery:

Type Hive BigQuery
Managed tables create table table_name (

id int,

dtDontQuery string,

name string

)

CREATE TABLE `myproject`.mydataset.table_name (

id INT64,

dtDontQuery STRING,

name STRING

)

Partitioned tables create table table_name (

id int,

dt string,

name string

)

partitioned by (date string)

CREATE TABLE `myproject`.mydataset.table_name (

id INT64,

dt DATE,

name STRING

)

PARTITION BY dt

OPTIONS(

partition_expiration_days=3,

description="a table partitioned by date_col"

)

Create table as select (CTAS) CREATE TABLE new_key_value_store

ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe"

STORED AS RCFile

AS

SELECT (key % 1024) new_key, concat(key, value) key_value_pair, dt

FROM key_value_store

SORT BY new_key, key_value_pair;

CREATE TABLE `myproject`.mydataset.new_key_value_store

When partitioning by date, uncomment the following:

PARTITION BY dt

OPTIONS(

description="Table Description",

When partitioning by date, uncomment the following. It's recommended to use require_partition when the table is partitioned.

require_partition_filter=TRUE

) AS

SELECT (key % 1024) new_key, concat(key, value) key_value_pair, dt

FROM key_value_store

SORT BY new_key, key_value_pair'

Create Table Like:

The LIKE form of CREATE TABLE lets you copy an existing table definition exactly.

CREATE TABLE empty_key_value_store

LIKE key_value_store [TBLPROPERTIES (property_name=property_value, ...)];

Not supported.
Bucketed sorted tables (clustered in BigQuery terminology) CREATE TABLE page_view(

viewTime INT,

userid BIGINT,

page_url STRING,

referrer_url STRING,

ip STRING COMMENT 'IP Address of the User'

)

COMMENT 'This is the page view table'

PARTITIONED BY(dt STRING, country STRING)

CLUSTERED BY(userid) SORTED BY(viewTime) INTO 32 BUCKETS

ROW FORMAT DELIMITED

FIELDS TERMINATED BY '\001'

COLLECTION ITEMS TERMINATED BY '\002'

MAP KEYS TERMINATED BY '\003'

STORED AS SEQUENCEFILE;

CREATE TABLE `myproject` mydataset.page_view (

viewTime INT,

dt DATE,

userId BIGINT,

page_url STRING,

referrer_url STRING,

ip STRING OPTIONS (description="IP Address of the User")

)

PARTITION BY dt

CLUSTER BY userId

OPTIONS (

partition_expiration_days=3,

description="This is the page view table",

require_partition_filter=TRUE

)'

For more information, see Create and use clustered tables.

Skewed tables (tables where one or more columns have skewed values) CREATE TABLE list_bucket_multiple (col1 STRING, col2 int, col3 STRING)

SKEWED BY (col1, col2) ON (('s1',1), ('s3',3), ('s13',13), ('s78',78)) [STORED AS DIRECTORIES];

Not supported.
Temporary tables CREATE TEMPORARY TABLE list_bucket_multiple (

col1 STRING,

col2 int,

col3 STRING);

You can achieve this using expiration time as follows:

CREATE TABLE mydataset.newtable

(

col1 STRING OPTIONS(description="An optional INTEGER field"),

col2 INT64,

col3 STRING

)

PARTITION BY DATE(_PARTITIONTIME)

OPTIONS(

expiration_timestamp=TIMESTAMP "2020-01-01 00:00:00 UTC",

partition_expiration_days=1,

description="a table that expires in 2020, with each partition living for 24 hours",

labels=[("org_unit", "development")]

)

Transactional tables CREATE TRANSACTIONAL TABLE transactional_table_test(key string, value string) PARTITIONED BY(ds string) STORED AS ORC; All table modifications in BigQuery are ACID (atomicity, consistency, isolation, durability) compliant.
Drop table DROP TABLE [IF EXISTS] table_name [PURGE]; {DROP TABLE | DROP TABLE IF EXISTS}

table_name

Truncate table TRUNCATE TABLE table_name [PARTITION partition_spec];

partition_spec:

: (partition_column = partition_col_value, partition_column = partition_col_value, ...)

Not supported. The following workarounds are available:

  • Drop and create the table again with the same schema.
  • Set write disposition for table to WRITE_TRUNCATE if the truncate operation is a common use case for the given table.
  • Use the CREATE OR REPLACE TABLE statement.
  • Use the DELETE from table_name WHERE 1=1 statement.

Note: Specific partitions can also be truncated. For more information, see Delete a partition.

CREATE EXTERNAL TABLE and DROP EXTERNAL TABLE statements

For external table support in BigQuery, see Introduction to external data sources.

CREATE VIEW and DROP VIEW statements

The following table provides details about converting CREATE VIEW statements from Hive to BigQuery:

Hive BigQuery
CREATE VIEW [IF NOT EXISTS] [db_name.]view_name [(column_name [COMMENT column_comment], ...) ]

[COMMENT view_comment]

[TBLPROPERTIES (property_name = property_value, ...)]

AS SELECT ...;

{CREATE VIEW | CREATE VIEW IF NOT EXISTS | CREATE OR REPLACE VIEW}

view_name

[OPTIONS(view_option_list)]

AS query_expression

CREATE MATERIALIZED VIEW [IF NOT EXISTS] [db_name.]materialized_view_name

[DISABLE REWRITE]

[COMMENT materialized_view_comment]

[PARTITIONED ON (col_name, ...)]

[

[ROW FORMAT row_format]

[STORED AS file_format]

| STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]

]

[LOCATION hdfs_path]

[TBLPROPERTIES (property_name=property_value, ...)]

AS

;

CREATE MATERIALIZED VIEW [IF NOT EXISTS] \ [project_id].[dataset_id].materialized_view_name

-- cannot disable rewrites in BigQuery

[OPTIONS(

[description="materialized_view_comment",] \ [other materialized_view_option_list]

)]

[PARTITION BY (col_name)] --same as source table

CREATE FUNCTION and DROP FUNCTION statements

The following table provides details about converting stored procedures from Hive to BigQuery:

Hive BigQuery
CREATE TEMPORARY FUNCTION function_name AS class_name; CREATE { TEMPORARY | TEMP } FUNCTION function_name ([named_parameter[, ...]])

[RETURNS data_type]

AS (sql_expression)

named_parameter:

param_name param_type

DROP TEMPORARY FUNCTION [IF EXISTS] function_name; Not supported.
CREATE FUNCTION [db_name.]function_name AS class_name

[USING JAR|FILE|ARCHIVE 'file_uri' [, JAR|FILE|ARCHIVE 'file_uri'] ];

Supported for allowlisted projects as an alpha feature.

CREATE { FUNCTION | FUNCTION IF NOT EXISTS | OR REPLACE FUNCTION }

function_name ([named_parameter[, ...]])

[RETURNS data_type]

AS (expression);

named_parameter:

param_name param_type

DROP FUNCTION [IF EXISTS] function_name; DROP FUNCTION [ IF EXISTS ] function_name
RELOAD FUNCTION; Not supported.

CREATE MACRO and DROP MACRO statements

The following table provides details about converting procedural SQL statements used in creating macro from Hive to BigQuery with variable declaration and assignment:

Hive BigQuery
CREATE TEMPORARY MACRO macro_name([col_name col_type, ...]) expression; Not supported. In some cases, this can be substituted with a UDF.
DROP TEMPORARY MACRO [IF EXISTS] macro_name; Not supported.

Error codes and messages

Hive error codes and BigQuery error codes are different. If your application logic is catching errors, eliminate the source of the error because BigQuery doesn't return the same error codes.

In BigQuery, it's common to use the INFORMATION_SCHEMA views or audit logging to examine errors.

Consistency guarantees and transaction isolation

Both Hive and BigQuery support transactions with ACID semantics. Transactions are enabled by default in Hive 3.

ACID semantics

Hive supports snapshot isolation. When you execute a query, the query is provided with a consistent snapshot of the database, which it uses until the end of its execution. Hive provides full ACID semantics at the row level, letting one application add rows when another application reads from the same partition without interfering with each other.

BigQuery provides optimistic concurrency control (first to commit wins) with snapshot isolation, in which a query reads the last committed data before the query starts. This approach guarantees the same level of consistency for each row and mutation, and across rows within the same DML statement, while avoiding deadlocks. For multiple DML updates to the same table, BigQuery switches to pessimistic concurrency control. Load jobs can run independently and append tables; however, BigQuery doesn't provide an explicit transaction boundary or session.

Transactions

Hive doesn't support multi-statement transactions. It doesn't support BEGIN, COMMIT, and ROLLBACK statements. In Hive, all language operations are auto-committed.

BigQuery supports multi-statement transactions inside a single query or across multiple queries when you use sessions. A multi-statement transaction lets you perform mutating operations, such as inserting or deleting rows from one or more tables and either committing or rolling back the changes. For more information, see Multi-statement transactions.