Skip to main content

Datometry Documentation

Gateway Configuration Parameters

The [gateway] defines the connection to the target data warehouse. For example, Azure Synapse Analytics or Google BigQuery.

[gateways]
"gateway".name = "gateway_bigquery"
"gateway".connection_string = "Driver=/opt/simba/googlebigqueryodbc/lib/64/libgooglebigqueryodbc_sb64.so;Server=__HOST__;Port=__PORT__;Database=__DATABASE__;Uid=__USER__;Pwd=__PASSWORD__"
"gateway".host = "servername"
"gateway".port = 1433
"gateway".Ipv = 4
"gateway".type = bigquery
"gateway".version = "100000"
"gateway".default_table_type = "multiset"
"gateway".db_interface = rest_api
"gateway".parallel_conversion_available_memory_size = 3200MB
"gateway".max_tdf_size = 100MB
"gateway".max_tdf_rows = 10000000
Table 5. Gateway Parameters

Parameter

Type

Description

allow_index_commands

Boolean

If set to true, Hyper-Q generates an error indicating the indexes are not supported on the backend system. If set to false, which is the default for use with BigQuery, SQL statements used to create or drop an index are ignored.

"gateway".allow_index_commands=false

To learn more, see Enable or Disable Index Not Supported Error Messages

auth_backend

String

Specifies the authentication method to use. When set to bigquery_oauth authentication is performed using a user account and the supported username and password. The user is client_id and password is client_secret||||refresh_token.

bigquery_dbc_view_chunk_size

String

To emulate Teradata catalog emulation in BigQuery, Hyper-Q constructs a UNION of values that make up a Teradata DBC views (for example, DBC.COLUMNSV and DBC.TABLESV). As the number of objects in the database grows, so too does the number of UNION operations in the SQL output. BigQuery has a limit on the number of objects that can be referenced in a query. To accommodate this, Hyper-Q breaks apart the large UNION operation and materializes them into temporary tables. You can specify the chunk size of these tables using the bigquery_dbc_view_chunk_size parameter.

If this parameter is specified with too high a setting, the size of the row chunks in each UNION operation will be quite large. This results in less UNION operations, however, the larger the UNION operation the slower their execution. If the bigquery_dbc_view_chunk_size parameter is set too low, each UNION operation will consist of very few rows. This will lead to UNIONs of a large number of temporary tables which may slow the overall UNION operation.

For these reasons, the optimum value to specify for your BigQuery environment will be somewhere between these high and low settings, and will differ for each workload and the data in your system. Datometry recommends experimenting with different chunk size settings to determine the optimum value for your environment. To begin tuning your BigQuery environment, experiment with the following values and observe the performance of your workload: 10, 25, 50, 100, 200, 300, 450. The value to use for a given workload will be somewhere between these values.

To use bigquery_dbc_view_chunk_size include it in the [gateway] section of the Hyper-Q configuration file (dtm.ini), and specify the chunk size to use for the temporary tables. For example:

"gateway".bigquery_dbc_view_chunk_size=50

bigquery_use_param_data_types

Boolean

Specifies that Hyper-Q use parameterized data types on columns for the data types STRING, BYTES, NUMERIC, and BIGNUMERIC.

A data type that is declared with parameters is called a parameterized data type. You can only use parameterized data types with columns. A column with a parameterized data type is a parameterized column. Parameterized type constraints are enforced when writing a value to a parameterized column. To learn more, see Parameterized data types in the Google BigQuery documentation.

To enable bigquery_use_param_data_types, set it to true. For example: "gateway."bigquery_use_param_data_types = true

If enabled, bigquery_use_param_data_types will translate the following Teradata query on BigQuery as shown.

CREATE TABLE table_name (column1 CHAR(10), column2 NUMERIC, column3 NUMERIC(32,1), column4 BYTE(100)); 
CREATE    
TABLE
    `project_id`.`dataset_id`.`table_name` (
        `column1` STRING (10) ,
        `column2` NUMERIC (5, 0) ,
        `column3` BIGNUMERIC (32, 1) ,
        `column4` BYTES (100) 
    ) 

If bq_use_param_data_types is set to false, the same Teradata query will be translated as follows on BigQuery.

CREATE    
TABLE
    `project_id`.`dataset_id`.`table_name` (
        `column1` STRING ,
        `column2` NUMERIC ,
        `column3` BIGNUMERIC ,
        `column4` BYTES 
    ) 

Note

Hyper-Q will only serialize a column data type. Currently, BigQuery does not support parameterized data types in CAST expressions. For example, the SQL query will return an error:

SELECT CAST ('''12.3''' AS NUMERIC(3,1));
Parameterized types are not allowed in CAST expressions.

Important

Datometry recommends that new Hyper-Q deployments from version 3.87 onwards enable bigquery_use_param_data_types. To ensure compatibility with data warehouses replatformed using earlier versions of Hyper-Q, Datometry recommends setting bigquery_use_param_data_types to false.

bq_names_case_sensitivity

String

Specifies the name case Hyper-Q uses when creating or looking up metadata objects. The bq_names_case_sensitivity parameter accepts the following values:

  • "preserve" Names are kept as they appear in the statement. This is the default value.

  • "lower" Lower case is used to create and lookup objects.

  • "upper" Upper case is used to create and lookup objects.

Important

You must enclose the values you assign to bq_names_case_sensitivity within double quotes ("). For example:

"gateway".bq_names_case_sensitivity="preserve"

bulk_load_bucket_name

String

Name of the Google Cloud Storage bucket from which to bulk load data. For example:

"gateway".bulk_load_bucket_name = hyperq-load-bucket

To learn more about Google Cloud Storage buckets and loading data into BigQuery, see the following articles in the Google Cloud documentation:

bulk_load_chunk_size

Integer

Specifies the maximum file size of each intermediate Teradata MultiLoad or FastLoad data file. When the specified file size is reached, the file is uploaded to the database and a new file is created. This value is reflected in the number of times Hyper-Q issues the upload command to the BigQuery upload tool and may affect job performance. For example, if the value is too small, Hyper-Q issues several commands, and uploads a small file each time. If the value is too large, Hyper-Q will wait for a long time before any data is uploaded to the destination database.

Accepted values are composed of a number and a unit. Acceptable units are: TB, GB, MB, KB and B. For example 10MB indicates a max intermediate file size of 10 MB. The default value is 1GB.

"gateway".bulk_load_chunk_size = 1GB

bulk_load_err_gen2

String

To enable the generation 2 version of bulk load error handling, specify the configuration parameter  bulk_load_err_gen2=true. If the generation 2 bulk load processing encounters an error, it will fall back to generation 1 error handling for errors on DML operations other than INSERT.

"gateway".bulk_load_err_gen2=true

To disable generation 2 bulk load error handling, specify false.

"gateway".bulk_load_err_gen2=false

When set to force , Hyper-Q will perform a hard abort that interrupts bulk loader execution if it encounters an error.

"gateway".bulk_load_err_gen2=force

To learn more, see Enable Generation 2 Bulk Load Error Handling.

bulk_load_field_sep

String

Specifies the field separator to use during bulk loading. A comma (,) is the terminator for a field.

"gateway".bulk_load_field_sep = ","

Also see the configuration parameter bulk_load_row_sep.

bulk_load_retry_max_error_rows

Integer

Specifies the maximum number of times Hyper-Q will attempt to identify a row causing an error. The default value is 0.

Hyper-Q uses a slow error handling approach that may require executing a query log several times to discover the exact row causing an error. For databases with a large number of rows, this may cause performance issues due to excessive query logging. If detailed error reporting is not required, specify a value of 0.

gateway".bulk_load_retry_max_error_rows = 0

Hyper-Q 3.93 introduced bulk_load_err_gen2, the second generation of bulk load error handling. The generation 2 bulk load error handling improves performance and provides more information about errors than the previous version of bulk load error handling. Datometry recommends using generation 2 bulk load error handling instead of bulk_load_retry_max_error_rows. To learn more, see the configuration parameter bulk_load_err_gen2.

bulk_load_row_sep

String

Specifies the row separator to use during bulk loading. A \n specifies the row terminator as a newline character.

"gateway".bulk_load_row_sep = "\n"

Also see the configuration parameter bulk_load_field_sep.

bulk_load_utility

String

Specifies the method through which to upload data to BigQuery. The options for bulk loading data are bq_load and bq_load_api.

The bq_load parameter specifies that Hyper-Q use a file-based loading utility. Hyper-Q writes data files locally, then the bulk load utility uploads the data files from local file storage into the work table. The following example specifies that Hyper-Q use the Google BigQuery bulk load utility.

"gateway".bulk_load_utility = bq_load

The bq_load_api parameter allows you to upload data directly to Google Cloud Storage without using disk storage as an intermediary. When enabled, data is first uploaded to Google Cloud Storage, and is then copied from Google Cloud Storage to the work table. The following example specifies that Hyper-Q use the Google BigQuery API.

"gateway".bulk_load_utility = bq_load_api

connection_string

String

Specifies the database connection.

"gateway".connection_string = "Driver=__Driver__; Server=__HOST__;Port=__PORT__;
Database=__DATABASE__;Uid=__USER__; Pwd=__PASSWORD__"

db_interface = python | rest_api

String

Important

Beginning with Hyper-Q 3.104, the db_interface parameter is deprecated. Hyper-Q now only uses ODBC. To learn more, see Configure Hyper-Q to Use the Magnitude Simba ODBC Driver.

Specifies that Hyper-Q directly access Google BigQuery using the Google REST API or an external Python application and the Google Python API. The db_interface parameter accepts the following values:

python Specifies that Hyper-Q use an external Python program and the Google Python API to access the BigQuery database.

rest_api Specifies that Hyper-Q use the Google REST API directly to access the database using the BigQuery REST interface to run SQL queries and fetch results. The Google REST API is the default value if db_interface is not specified.

Note

Currently, Hyper-Q does not support multi-statement requests or parameterized queries through the Google REST API. In these cases Hyper-Q uses the external Python program and API.

default_table_type=multiset

String

Specifies that Hyper-Q emulate either Teradata MULTISET or SET tables for CREATE TABLE statements that do not specify a table type. The default is to create MULTISET tables.

The following example specifies that Hyper-Q emulate Teradata SET tables.

"gateway".default_table_type=set

host = hostname

String

Host name of the cloud-based data warehouse server.

ignore_bulkload_errors

Boolean

Specifies that Hyper-Q only create error tables if and when bulk load errors are encountered.

By default, Hyper-Q creates error tables and drops them if they remain empty when the bulk load job completes (in which case no errors were encountered during bulk loading). By setting ignore_bulkload_errors to true, Hyper-Q does not create error tables unless an error is encountered.

"gateway".ignore_bulkload_errors = true

ignore_stats_on_missing_named_index

Boolean

Specifies that Hyper-Q ignore statistics for missing named indexes. The ignored indexes are primary indexes. Secondary indexes are correctly recorded.

To enable ignore_stats_on_missing_named_index, set it to true. The default value is false.

"gateway".ignore_stats_on_missing_named_index = true

To learn more, see Ignore Statistics for Missing Named IndexesIgnore Statistics for Missing Named Indexes

ipv

Integer

Specifies the type of IP address to use with the cloud-based data warehouse, which is either IPv4 or IPv6.

max_tdf_rows

Integer

Hyper-Q receives the result from the backend and splits the result into tabular data format (tdf) chunks which are sent for conversion. Specify the chunk size using the number of rows. For example: "gateway".max_tdf_rows = 10000000

Caution

Before modifying this parameter, contact Datometry Support to see if it is compatible with your environment.

max_tdf_size

Integer

Hyper-Q receives the result from the backend and splits the result into tabular data format (tdf) chunks which are sent for conversion. Specify the size of the chunk size in bytes. For example:

"gateway".max_tdf_size = 100MB

Caution

Before modifying this parameter, contact Datometry Support to see if it is compatible with your environment.

name

String

Specifies the name by which the gateway is called by other objects in the Hyper-Q configuration file.

numeric_to_int_emulation

Boolean

Enables the mapping of the Teradata NUMERIC data type to the BigQuery INT64 data type.

To enable the mapping of Teradata NUMERIC to BigQuery INT64, set numeric_to_int_emulation to true. For example:

"gateway".numeric_to_int_emulation = true

Caution

The Data Definition Language (DDL) commands in SQL are used to create the database schema and to define the type and structure of the data that will be stored in the database. For this reason, once you execute DDL commands on your system, you can not enable and then disable numeric_to_int_emulation. You must use the same setting at all times once you have created your database.

To learn more about numeric_to_int_emulation, see Map Teradata NUMERIC to BigQuery INT64 Data Type.

parallel_conversion_available_ memory_size

Integer

Specifies the maximum memory that can be used by the Hyper-Q Parallel Conversion Process per session, which controls the degree of paralleism. The default value is 1 MB. To calculate the recommended value for your deployment, multiply the value of the max_tdf_size parameter by the number of cores, and multiply by 2:

max_tdf_size * number of cores * 2 = maximum memory

Supported memory units are B (byte), KB (kilobyte), MB (megabyte), GB (gigabyte), TB (terabyte). For example: 3200MB

Caution

Before modifying this parameter, contact Datometry Support to see if it is compatible with your environment.

port

Integer

Specifies the network port of the cloud-based data warehouse server.

temp_table_schema

String

Specifies the schema in which to create regular tables that emulate temporary tables.

BigQuery supports datasets (schemas) with an auto-expiration option. You must create the schema with the appropriate configuration before specifying it using temp_table_schema, as the specified schema stores regular tables that expire after the expiration time is passed.

To learn more, see Updating default table expiration times in the Google documentation.

timestamp_to_datetime_emulation

Boolean

Specifies that Hyper-Q map Teradata TIMESTAMP to BigQuery DATETIME instead of BigQuery TIMESTAMP. By default, this configuration parameter is set to false, or turned off.

TIMESTAMP WITH TIMEZONE is not affected by the timestamp_to_datetime_emulation configuration parameter.

You must use this setting across the entirety of your BigQuery project, and it must remain unchanged once you determine you want to convert Teradata TIMESTAMP to BigQuery DATETIME. You cannot specify the conversion for some SQL statements and not for others. For example, if you were to turn on Teradata TIMESTAMP to BigQuery DATETIME emulation for table creation, and then turn it off to insert data, there would be an error returned due to a data type mismatch.

type

String

Specifies the cloud-based data warehouse vendor. Acceptable values are: bigquery and msadw.

version

Integer

Specifies that the endpoint treat the list of schemas as the default schema. In the following example, Hyper-Q treat another_schema as the default schema.

"gateway".version = “100000”