Features and Configurations
In this section, we provide documentation about some of the features available for dbt with ClickHouse.
Profile.yml configurations
To connect to ClickHouse from dbt, you'll need to add a profile to your profiles.yml file. A ClickHouse profile conforms to the following syntax:
Schema vs Database
The dbt model relation identifier database.schema.table is not compatible with Clickhouse because Clickhouse does not
support a schema.
So we use a simplified approach schema.table, where schema is the Clickhouse database. Using the default database
is not recommended.
SET Statement Warning
In many environments, using the SET statement to persist a ClickHouse setting across all DBT queries is not reliable and can cause unexpected failures. This is particularly true when using HTTP connections through a load balancer that distributes queries across multiple nodes (such as ClickHouse cloud), although in some circumstances this can also happen with native ClickHouse connections. Accordingly, we recommend configuring any required ClickHouse settings in the "custom_settings" property of the DBT profile as a best practice, instead of relying on a pre-hook "SET" statement as has been occasionally suggested.
Setting quote_columns
To prevent a warning, make sure to explicitly set a value for quote_columns in your dbt_project.yml. See the doc on quote_columns for more information.
About the ClickHouse Cluster
When using a ClickHouse cluster, you need to consider two things:
- Setting the
clustersetting. - Ensuring read-after-write consistency, especially if you are using more than one
threads.
Cluster Setting
The cluster setting in profile enables dbt-clickhouse to run against a ClickHouse cluster. If cluster is set in the profile, all models will be created with the ON CLUSTER clause by default—except for those using a Replicated engine. This includes:
- Database creation
- View materializations
- Table and incremental materializations
- Distributed materializations
Replicated engines will not include the ON CLUSTER clause, as they are designed to manage replication internally.
To opt out of cluster-based creation for a specific model, add the disable_on_cluster config:
table and incremental materializations with non-replicated engine will not be affected by cluster setting (model would
be created on the connected node only).
Compatibility
If a model has been created without a cluster setting, dbt-clickhouse will detect the situation and run all DDL/DML
without on cluster clause for this model.
Read-after-write Consistency
dbt relies on a read-after-insert consistency model. This is not compatible with ClickHouse clusters that have more than one replica if you cannot guarantee that all operations will go to the same replica. You may not encounter problems in your day-to-day usage of dbt, but there are some strategies depending on your cluster to have this guarantee in place:
- If you are using a ClickHouse Cloud cluster, you only need to set
select_sequential_consistency: 1in your profile'scustom_settingsproperty. You can find more information about this setting here. - If you are using a self-hosted cluster, make sure all dbt requests are sent to the same ClickHouse replica. If you have a load balancer on top of it, try using some
replica aware routing/sticky sessionsmechanism to be able to always reach the same replica. Adding the settingselect_sequential_consistency = 1in clusters outside ClickHouse Cloud is not recommended.
Additional ClickHouse macros
Model materialization utility macros
The following macros are included to facilitate creating ClickHouse specific tables and views:
engine_clause-- Uses theenginemodel configuration property to assign a ClickHouse table engine. dbt-clickhouse uses theMergeTreeengine by default.partition_cols-- Uses thepartition_bymodel configuration property to assign a ClickHouse partition key. No partition key is assigned by default.order_cols-- Uses theorder_bymodel configuration to assign a ClickHouse order by/sorting key. If not specified ClickHouse will use an empty tuple() and the table will be unsortedprimary_key_clause-- Uses theprimary_keymodel configuration property to assign a ClickHouse primary key. By default, primary key is set and ClickHouse will use the order by clause as the primary key.on_cluster_clause-- Uses theclusterprofile property to add anON CLUSTERclause to certain dbt-operations: distributed materializations, views creation, database creation.ttl_config-- Uses thettlmodel configuration property to assign a ClickHouse table TTL expression. No TTL is assigned by default.
s3Source helper macro
The s3source macro simplifies the process of selecting ClickHouse data directly from S3 using the ClickHouse S3 table
function. It works by
populating the S3 table function parameters from a named configuration dictionary (the name of the dictionary must end
in s3). The macro
first looks for the dictionary in the profile vars, and then in the model configuration. The dictionary can contain
any of the following
keys used to populate the parameters of the S3 table function:
| Argument Name | Description |
|---|---|
| bucket | The bucket base url, such as https://datasets-documentation.s3.eu-west-3.amazonaws.com/nyc-taxi. https:// is assumed if no protocol is provided. |
| path | The S3 path to use for the table query, such as /trips_4.gz. S3 wildcards are supported. |
| fmt | The expected ClickHouse input format (such as TSV or CSVWithNames) of the referenced S3 objects. |
| structure | The column structure of the data in bucket, as a list of name/datatype pairs, such as ['id UInt32', 'date DateTime', 'value String'] If not provided ClickHouse will infer the structure. |
| aws_access_key_id | The S3 access key id. |
| aws_secret_access_key | The S3 secret key. |
| role_arn | The ARN of a ClickhouseAccess IAM role to use to securely access the S3 objects. See this documentation for more information. |
| compression | The compression method used with the S3 objects. If not provided ClickHouse will attempt to determine compression based on the file name. |
See the S3 test file for examples of how to use this macro.
Cross database macro support
dbt-clickhouse supports most of the cross database macros now included in dbt Core with the following exceptions:
- The
split_partSQL function is implemented in ClickHouse using the splitByChar function. This function requires using a constant string for the "split" delimiter, so thedelimeterparameter used for this macro will be interpreted as a string, not a column name - Similarly, the
replaceSQL function in ClickHouse requires constant strings for theold_charsandnew_charsparameters, so those parameters will be interpreted as strings rather than column names when invoking this macro.
Catalog Support
dbt Catalog Integration Status
dbt Core v1.10 introduced catalog integration support, which allows adapters to materialize models into external catalogs that manage open table formats like Apache Iceberg. This feature is not yet natively implemented in dbt-clickhouse. You can track the progress of this feature implementation in GitHub issue #489.
ClickHouse Catalog Support
ClickHouse recently added native support for Apache Iceberg tables and data catalogs. Most of the features are still experimental, but you can already use them if you use a recent ClickHouse version.
-
You can use ClickHouse to query Iceberg tables stored in object storage (S3, Azure Blob Storage, Google Cloud Storage) using the Iceberg table engine and iceberg table function.
-
Additionally, ClickHouse provides the DataLakeCatalog database engine, which enables connection to external data catalogs including AWS Glue Catalog, Databricks Unity Catalog, Hive Metastore, and REST Catalogs. This allows you to query open table format data (Iceberg, Delta Lake) directly from external catalogs without data duplication.
Workarounds for Working with Iceberg and Catalogs
You can read data from Iceberg tables or catalogs from your dbt project if you have already defined them in your ClickHouse cluster with the tools defined above. You can leverage the source functionality in dbt to reference these tables in your dbt projects. For example, if you want to access your tables in a REST Catalog, you can:
- Create a database pointing to an external catalog:
- Define the catalog database and its tables as sources in dbt: remember that the tables should already be available in ClickHouse
- Use the catalog tables in your dbt models:
Notes on the Workarounds
The good things about these workarounds are:
- You'll have immediate access to different external table types and external catalogs without waiting for native dbt catalog integration.
- You'll have a seamless migration path when native catalog support becomes available.
But there are currently some limitations:
- Manual setup: Iceberg tables and catalog databases must be created manually in ClickHouse before they can be referenced in dbt.
- No catalog-level DDL: dbt cannot manage catalog-level operations like creating or dropping Iceberg tables in external catalogs. So you will not be able to create them right now from the dbt connector. Creating tables with the Iceberg() engines may be added in the future.
- Write operations: Currently, writing into Iceberg/Data Catalog tables is limited. Check the ClickHouse documentation to understand which options are available.