Skip to main content

Multiple Catalog

Summary

Allow users to maintain multiple catalogs for the databend.

Motivation

Databend supports multiple catalogs now, but only in a static way.

To allow accessing the hive catalog, users need to configure hive inside databend-query.toml in this way:

[catalog]
meta_store_address = "127.0.0.1:9083"
protocol = "binary"

Users can't add/alter/remove the catalogs during runtime.

By allowing users to maintain multiple catalogs for the databend, we can integrate more catalogs like iceberg more quickly.

Guide-level explanation

After this RFC has been implemented, users can create new catalogs like:

CREATE CATALOG my_hive
TYPE='hive'
CONNECTION = (URL='<hive-meta-store>' THRIFT_PROTOCOL=BINARY);
SELECT * FROM my_hive.DB.table;

Besides, users can alter or drop a catalog:

DROP CATALOG my_hive;

Users can add more catalogs like:

CREATE CATALOG my_iceberg
TYPE='iceberg'
CONNECTION = (URL='s3://my_bucket/path/to/iceberg');
SELECT * FROM my_iceberg.DB.table;

With this feature, users can join data from different catalogs now:

select
my_iceberg.DB.purchase_records.Client_ID,
my_iceberg.DB.purchase_records.Item,
my_iceberg.DB.purchase_records.QTY
from my_hive.DB.vip_info
inner join my_iceberg.DB.purchase_records
on my_hive.DB.vip_info.Client_ID = my_iceberg.DB.purchase_records.Client_ID;

Reference-level explanation

Databend has a framework for multiple catalogs now. The only change for us is to store catalog-related information in metasrv instead.

To make it possible to start a query without metasrv, we will also support configuring catalogs in config like:

[catalogs.my_hive]
meta_store_address = "127.0.0.1:9083"
protocol = "binary"

[catalogs.my_iceberg]
URL = "s3://bucket"

Static catalogs will always be loaded from configs and can't be altered or dropped.

Drawbacks

None.

Rationale and alternatives

None.

Prior art

Presto

Presto is an open-source SQL query engine that's fast, reliable, and efficient at scale. It doesn't have persisted states, so all its connectors will be configured.

Take iceberg as an example:

connector.name=iceberg
hive.metastore.uri=hostname:port
iceberg.catalog.type=hive

While using:

USE iceberg.tpch;
CREATE TABLE IF NOT EXISTS ctas_nation AS (SELECT * FROM nation);
DESCRIBE ctas_nation;

Unresolved questions

None.

Future possibilities

Iceberg Catalog

Discussed in RFC Iceberg External Table

Delta Sharing Catalog

Discussed in Tracking issues of integration with delta sharing