Concepts
Connectors
Trino is a tool designed to efficiently query vast amounts of data using distributed queries. It is not a database with its own storage but rather interacts with many different data stores. Trino connects to these data stores - or data sources - via connectors. Each connector enables access to a specific underlying data source such as a Hive warehouse, a PostgreSQL database or a Druid instance.
A Trino cluster comprises two roles: the Coordinator, responsible for managing and monitoring work loads, and the Worker, which is responsible for executing specific tasks that together make up a work load. The workers fetch data from the connectors, execute tasks and share intermediate results. The coordinator collects and consolidates these results for the end-user.
Catalogs
An instance of a connector is called a catalog.
Think of a setup containing a large Hive warehouse running on HDFS.
There may exist two different catalogs called e.g. warehouse_1
and warehouse_2
each specifying the same hive
connector.
Currently, the following connectors are supported:
Catalog references
Within Stackable a TrinoCatalog
consists of one or more (mandatory or optional) components which are specific to that catalog. A catalog should be re-usable within multiple Trino clusters. Catalogs are referenced by Trino clusters with labels and label selectors: this is consistent with the Kubernetes paradigm and keeps the definitions simple and flexible.
The following diagram illustrates this. Two Trino catalogs - each an instance of a particular connector - are declared with labels that used to match them to a Trino cluster:
A complete example of this is shown here: Catalogs.