Concepts

Connectors

Trino is a tool designed to efficiently query vast amounts of data using distributed queries. It is not a database with its own storage but rather interacts with many different data stores. Trino connects to these data stores - or data sources - via connectors. Each connector enables access to a specific underlying data source such as a Hive warehouse, a PostgreSQL database or a Druid instance.

A Trino cluster comprises two roles: the Coordinator, responsible for managing and monitoring work loads, and the Worker, which is responsible for executing specific tasks that together make up a work load. The workers fetch data from the connectors, execute tasks and share intermediate results. The coordinator collects and consolidates these results for the end-user.

Catalogs

An instance of a connector is called a catalog. Think of a setup containing a large Hive warehouse running on HDFS. There may exist two different catalogs called e.g. warehouse_1 and warehouse_2 each specifying the same hive connector.

Currently, the following connectors are supported:

Catalog references

Within Stackable a TrinoCatalog consists of one or more (mandatory or optional) components which are specific to that catalog. A catalog should be re-usable within multiple Trino clusters. Catalogs are referenced by Trino clusters with labels and label selectors: this is consistent with the Kubernetes paradigm and keeps the definitions simple and flexible.

The following diagram illustrates this. Two Trino catalogs - each an instance of a particular connector - are declared with labels that used to match them to a Trino cluster:

A TrinoCluster referencing two catalogs by label matching

A complete example of this is shown here: Catalogs.