Usage

Druid requires a Zookeeper to run, as well as a database. If HDFS is used as the backend-storage (so-called "deep storage") then the HDFS operator is required as well.

Setup Prerequisites

Zookeeper operator

Please refer to the Zookeeper operator and docs.

HDFS operator (optional)

Please refer to the HDFS operator and docs.

SQL Database

Druid requires a MySQL or Postgres database.

For testing purposes, you can spin up a PostgreSQL database with the bitnami PostgreSQL helm chart. Add the bitnami repository:

helm repo add bitnami https://charts.bitnami.com/bitnami

And set up the Postgres database:

helm install druid bitnami/postgresql \
--version=11 \
--set auth.username=druid \
--set auth.password=druid \
--set auth.database=druid

Creating a Druid Cluster

With the prerequisites fulfilled, the CRD for this operator must be created:

kubectl apply -f /etc/stackable/druid-operator/crd

Then a cluster can be deployed using the example below.

apiVersion: druid.stackable.tech/v1alpha1
kind: DruidCluster
metadata:
  name: simple-druid
spec:
  image:
    productVersion: 24.0.0
    stackableVersion: 0.3.0
  clusterConfig:
    deepStorage:
      hdfs:
        configMapName: simple-hdfs
        directory: /druid
    metadataStorageDatabase:
      dbType: postgresql
      connString: jdbc:postgresql://druid-postgresql/druid
      host: druid-postgresql    # this is the name of the Postgres service
      port: 5432
      user: druid
      password: druid
    zookeeperConfigMapName: simple-zk
  brokers:
    roleGroups:
      default:
        replicas: 1
  coordinators:
    roleGroups:
      default:
        replicas: 1
  historicals:
    roleGroups:
      default:
        replicas: 1
  middleManagers:
    roleGroups:
      default:
        replicas: 1
  routers:
    roleGroups:
      default:
        replicas: 1

Please note that the version you need to specify is not only the version of Druid which you want to roll out, but has to be amended with a Stackable version as shown. This Stackable version is the version of the underlying container image which is used to execute the processes. For a list of available versions please check our image registry. It should generally be safe to simply use the latest image version that is available.

The Router is hosting the web UI, a NodePort service is created by the operator to access the web UI. Connect to the simple-druid-router NodePort service and follow the druid documentation on how to load and query sample data.

Using S3

The Stackable Platform uses a common set of resource definitions for s3 across all operators. In general, you can configure an S3 connection or bucket inline, or as a reference to a dedicated object.

In Druid, S3 can be used for two things:

Ingesting data from a bucket
Using it as a backend for deep storage

You can specify just a connection/bucket for just one of these or for both, but Druid only supports a single S3 connection under the hood, so if two connections are specified, they must be the same. This is easiest if a dedicated S3 Connection Resource is used - not defined inline but as a dedicated object.

TLS for S3 is not yet supported.

S3 for ingestion

To ingest data from s3, you need to specify at least a host to connect to, but there are more settings that can be set:

spec:
  clusterConfig:
    ingestion:
      s3connection:
        host: yourhost.com  (1)
        port: 80 # optional (2)
        credentials: # optional (3)
        ...

1	The S3 host, not optional
2	Port, optional, defaults to 80
3	Credentials to use. Since these might be bucket-dependent, they can instead be given in the ingestion job. Specifying the credentials here is explained below.

S3 deep storage

Druid can use S3 as a backend for deep storage:

spec:
  clusterConfig:
    deepStorage:
      s3:
        bucket:
          inline:
            bucketName: my-bucket  (1)
            connection:
              inline:
                host: test-minio  (2)
                port: 9000  (3)
                credentials:  (4)
                ...

1	Bucket name.
2	Bucket host.
3	Optional bucket port.
4	Credentials explained below.

It is also possible to configure the bucket connection details as a separate Kubernetes resource and only refer to that object from the DruidCluster like this:

spec:
  clusterConfig:
    deepStorage:
      s3:
        bucket:
          reference: my-bucket-resource (1)

1	Name of the bucket resource with connection details.

The resource named my-bucket-resource is then defined as shown below:

---
apiVersion: s3.stackable.tech/v1alpha1
kind: S3Bucket
metadata:
  name: my-bucket-resource
spec:
  bucketName: my-bucket-name
  connection:
    inline:
      host: test-minio
      port: 9000
      credentials:
        ... (explained below)

This has the advantage that bucket configuration can be shared across `DruidClusters`s (and other stackable CRDs) and reduces the cost of updating these details.

S3 Credentials

No matter if a connection is specified inline or as a separate object, the credentials are always specified in the same way. You will need a Secret containing the access key ID and secret access key, a SecretClass and then a reference to this SecretClass where you want to specify the credentials.

The Secret:

apiVersion: v1
kind: Secret
metadata:
  name: s3-credentials
  labels:
    secrets.stackable.tech/class: s3-credentials-class  (1)
stringData:
  accessKey: YOUR_VALID_ACCESS_KEY_ID_HERE
  secretKey: YOUR_SECRET_ACCES_KEY_THATBELONGS_TO_THE_KEY_ID_HERE

1	This label connects the `Secret` to the `SecretClass`.

The SecretClass:

apiVersion: secrets.stackable.tech/v1alpha1
kind: SecretClass
metadata:
  name: s3-credentials-class
spec:
  backend:
    k8sSearch:
      searchNamespace:
        pod: {}

Referencing it:

...
credentials:
  secretClass: s3-credentials-class
...

HDFS deep storage

Druid can use HDFS as a backend for deep storage:

spec:
  clusterConfig:
    deepStorage:
      hdfs:
        configMapName: simple-hdfs (1)
        directory: /druid (2)
...

1	Name of the HDFS cluster discovery config map. Can be supplied manually for a cluster not provided by Stackable. Needs to contain the `core-site.xml` and `hdfs-site.xml`.
2	The directory where to store the druid data.

Security

The Druid cluster can be secured and protected in multiple ways.

Encryption

TLS encryption is supported for internal cluster communication (e.g. between Broker and Coordinator) as well as for external communication (e.g. between the Browser and the Router Web UI).

spec:
  clusterConfig:
    tls:
      serverAndInternalSecretClass: tls (1)

1	Name of the `SecretClass` that is used to encrypt internal and external communication.

A Stackable Druid cluster is always encrypted per default. In order to disable this default behavior you can set spec.clusterConfig.tls.serverAndInternalSecretClass: null.

Authentication

TLS

The access to the Druid cluster can be limited by configuring client authentication (mutual TLS) for all participants. This means that processes acting as internal clients (e.g. a Broker) or external clients (e.g. a Browser) have to authenticate themselves with valid certificates in order to communicate with the Druid cluster.

spec:
  clusterConfig:
    authentication:
      - authenticationClass: druid-tls-auth (1)

1	Name of the `AuthenticationClass` that is used to encrypt and authenticate communication.

The AuthenticationClass may or may not have a SecretClass configured:

---
apiVersion: authentication.stackable.tech/v1alpha1
kind: AuthenticationClass
metadata:
  name: druid-mtls-authentication-class
spec:
  provider:
    # Option 1
    tls:
      clientCertSecretClass: druid-mtls (1)
    # Option 2
    tls: {} (2)

1	If a client `SecretClass` is provided in the `AuthenticationClass` (here `druid-mtls`), these certificates will be used for encryption and authentication.
2	If no client `SecretClass` is provided in the `AuthenticationClass`, the `spec.clusterConfig.tls.serverAndInternalSecretClass` will be used for encryption and authentication. It cannot be explicitly set to null in this case.

LDAP

Currently not supported.

Authorization with Open Policy Agent (OPA)

Druid can connect to an Open Policy Agent (OPA) instance for authorization policy decisions. You need to run an OPA instance to connect to, for which we refer to the OPA Operator docs. How you can write RegoRules for Druid is explained below.

Once you have defined your rules, you need to configure the OPA cluster name and endpoint to use for Druid authorization requests. Add a section to the spec for OPA:

spec:
  clusterConfig:
    authorization:
      opa:
        configMapName: simple-opa (1)
        package: my-druid-rules (2)

1	The name of your OPA cluster (`simple-opa` in this case)
2	The RegoRule package to use for policy decisions. The package should contain an `allow` rule. This is optional and will default to the name of the Druid cluster.

Defining RegoRules

For a general explanation of how rules are written, we refer to the OPA documentation. Inside your rule you will have access to input from Druid. Druid provides this data to you to base your policy decisions on:

{
  "user": "someUsername", (1)
  "action": "READ", (2)
  "resource": {
    "type": "DATASOURCE", (3)
    "name": "myTable" (4)
  }
}

1	The authenticated identity of the user that wants to perform the action
2	The action type, can be either `READ` or `WRITE`.
3	The resource type, one of `STATE`, `CONFIG` and `DATASOURCE`.
4	In case of a datasource this is the table name, for `STATE` this will simply be `STATE`, the same for `CONFIG`.

For more details consult the Druid Authentication and Authorization Model.

Connecting to Druid from other Services

The operator creates a ConfigMap with the name of the cluster which contains connection information. Following our example above (the name of the cluster is simple-druid) a ConfigMap with the name simple-druid will be created containing 3 keys:

DRUID_ROUTER with the format <host>:<port>, which points to the router processes HTTP endpoint. Here you can connect to the web UI, or use REST endpoints such as /druid/v2/sql/ to query data. More information in the Druid Docs.
DRUID_AVATICA_JDBC contains a JDBC connect string which can be used together with the Avatica JDBC Driver to connect to Druid and query data. More information in the Druid Docs.
DRUID_SQALCHEMY contains a connection string used to connect to Druid with SQAlchemy, in - for example - Apache Superset.

Monitoring

The managed Druid instances are automatically configured to export Prometheus metrics. See Monitoring for more details.

Configuration & Environment Overrides

The cluster definition also supports overriding configuration properties and environment variables, either per role or per role group, where the more specific override (role group) has precedence over the less specific one (role).

Overriding certain properties which are set by operator (such as the HTTP port) can interfere with the operator and can lead to problems.

Configuration Properties

For a role or role group, at the same level of config, you can specify: configOverrides for the runtime.properties. For example, if you want to set the druid.server.http.numThreads for the router to 100 adapt the routers section of the cluster resource like so:

routers:
  roleGroups:
    default:
      config: {}
      configOverrides:
        runtime.properties:
          druid.server.http.numThreads: "100"
      replicas: 1

Just as for the config, it is possible to specify this at role level as well:

routers:
  configOverrides:
    runtime.properties:
      druid.server.http.numThreads: "100"
  roleGroups:
    default:
      config: {}
      replicas: 1

All override property values must be strings.

For a full list of configuration options we refer to the Druid Configuration Reference.

Environment Variables

In a similar fashion, environment variables can be (over)written. For example per role group:

routers:
  roleGroups:
    default:
      config: {}
      envOverrides:
        MY_ENV_VAR: "MY_VALUE"
      replicas: 1

or per role:

routers:
  envOverrides:
    MY_ENV_VAR: "MY_VALUE"
  roleGroups:
    default:
      config: {}
      replicas: 1

Storage for data volumes

Druid uses S3 or HDFS deep storage, so no extra PersistentVolumeClaims have to be specified.

Resource Requests

Stackable operators handle resource requests in a sligtly different manner than Kubernetes. Resource requests are defined on role or group level. See Roles and role groups for details on these concepts. On a role level this means that e.g. all workers will use the same resource requests and limits. This can be further specified on role group level (which takes priority to the role level) to apply different resources.

This is an example on how to specify CPU and memory resources using the Stackable Custom Resources:

---
apiVersion: example.stackable.tech/v1alpha1
kind: ExampleCluster
metadata:
  name: example
spec:
  workers: # role-level
    config:
      resources:
        cpu:
          min: 300m
          max: 600m
        memory:
          limit: 3Gi
    roleGroups: # role-group-level
      resources-from-role: # role-group 1
        replicas: 1
      resources-from-role-group: # role-group 2
        replicas: 1
        config:
          resources:
            cpu:
              min: 400m
              max: 800m
            memory:
              limit: 4Gi

In this case, the role group resources-from-role will inherit the resources specified on the role level. Resulting in a maximum of 3Gi memory and 600m CPU resources.

The role group resources-from-role-group has maximum of 4Gi memory and 800m CPU resources (which overrides the role CPU resources).

For Java products the actual used Heap memory is lower than the specified memory limit due to other processes in the Container requiring memory to run as well. Currently, 80% of the specified memory limits is passed to the JVM.

For memory only a limit can be specified, which will be set as memory request and limit in the Container. This is to always guarantee a Container the full amount memory during Kubernetes scheduling.

If no resources are configured explicitly, the Druid operator uses following defaults:

brokers:
  roleGroups:
    default:
      config:
        resources:
          cpu:
            min: '200m'
            max: "4"
          memory:
            limit: '2Gi'

The default values are most likely not sufficient to run a proper cluster in production. Please adapt according to your requirements.

For more details regarding Kubernetes CPU limits see: Assign CPU Resources to Containers and Pods.

Historical Resources

In addition to the cpu and memory resources described above, historical Pods also accept a storage resource with the following properties:

segmentCache - used to set the maximum size allowed for the historical segment cache locations. See the Druid documentation regarding druid.segmentCache.locations. The operator creates an emptyDir and sets the max_size of the volume to be the value of the capacity property. In addition Druid is configured to keep 7% volume size free. By default, if no segmentCache is configured, the operator will create an emptyDir with a size of 1G and freePercentage of 5.

Example historical configuration with storage resources:

historicals:
  roleGroups:
    default:
      config:
        resources:
          storage:
            segmentCache:
              # The amount of free space to subtract from the capacity setting below.
              freePercentage: 7
              emptyDir:
                # The maximum size of the volume used to store the segment cache
                capacity: 2g