Override Google BigQuery Table Clustering
By default, tables translated using Hyper-Q and qShift use the Teradata primary index columns as clustering columns in BigQuery to provide a tuning baseline. Tables may require additional tuning, such as adding partitions or customizing clustering, to achieve optimal query performance. You can override the creation of clustering columns using the Hyper-Q hint bigquery_clustering_cols_override
.
In a clustered table, BigQuery automatically sorts the data based on the values in the clustering columns and organizes them in optimally sized storage blocks. You can achieve more finely grained sorting by creating a table that is clustered and partitioned. To learn more about clustering columns for use with BigQuery, see Introduction to clustered tables in the Google BigQuery documentation.
In addition to specifying what columns to use for clustering on BigQuery, you can also partition tables in BigQuery using the Hyper-Q hint bigquery_partitions_override
. This hint allows you to specify how BigQuery partitions tables, which can make it more cost effective and performant to manage and query your data. To learn more, see Override Google BigQuery Partitions.
The syntax for bigquery_clustering_cols_override
is:
bigquery_clustering_cols_override = column_name_1,column_name_2,column_name_3,...
where:
Parameter | Description |
---|---|
| The name of the column or columns by which you want to cluster. You can specify a comma separated list of column names by which to cluster if you want to use more than one column. |
You can use the bigquery_clustering_cols_override
hint as a statement (or query) level hint. Hints at the statement scope apply to the entire SQL statement to which they are associated. You cannot use the bigquery_clustering_cols_override
hint at the session or application scope level. To learn more about Hyper-Q SQL hints, and how to use them within a SQL query, see About Hyper-Q SQL Hints.
You can specify bigquery_clustering_cols_override
as a query level hint. This example specifies the hint within a Teradata CREATE TABLE statement. The column labeled column3
is specified as the column to use for clustering. This is in contrast to the three columns identified as having a primary index: column2
, column3
, and column4
. If you do not specify a specific column BigQuery should use for clustering, it would use all three primary index columns as clustering columns.
CREATE TABLE table_name
-- @dtm_bigquery_clustering_cols_override=column3
(
column1 INT,
column2 INT,
column3 VARCHAR(10),
column4 VARCHAR(1)
) PRIMARY INDEX (column2, column3, column4) PARTITION BY (column2, column3, column4);
The Hyper-Q translated CREATE TABLE statement on BigQuery is as shown below. Note that the CLUSTER BY clause specifies clustering using column3
.
CREATE TABLE `project_name`.`dataset_name`.`table_name` ( `column1` INT64 , `column2` INT64 , `column3` STRING , `column4` STRING ) CLUSTER BY `column3` ;
This example specifies two columns as clustering columns: column3
and column4
. As in the previous example, the Teradata CREATE TABLE statement identifies three columns as having a primary index: column2
, column3
, and column4
. If you did not specify a specific set of columns BigQuery should use for clustering, it would use all three of these columns as clustering columns.
CREATE TABLE table_name
-- @dtm_bigquery_clustering_cols_override=column3,column4
(
column1 INT,
column2 INT,
column3 VARCHAR(10),
column4 VARCHAR(1)
) PRIMARY INDEX (column2, column3, column4) PARTITION BY (column2, column3, column4);
The translated CREATE TABLE statement on BigQuery is as shown below. Note that the CLUSTER BY clause specifies clustering using column3
and column4
.
CREATE TABLE `project_name`.`dataset_name`.`table_name` ( `column1` INT64 , `column2` INT64 , `column3` STRING , `column4` STRING ) CLUSTER BY `column3`, `column4` ;