What is a Kubernetes operator?

Simply put, an operator is an automated Site Reliability Engineer for an application. It is a way to package, run and maintain an application in a cloud native way. Operators reduce the management overhead for cluster administrators and make it easier for developers to use essential software components like databases and storage systems. Operators are mainly targeted at managing stateful applications which have non-trivial requirements with regards to storage, networking and fault-tolerance.

The operator is composed of 2 main components: a controller and a Custom Resource Definition (CRD) which defines the spec for a Custom Resource (CR). The controller is a program which runs in a loop and watches the CR for any changes, reconciling the actual state of the CR with the state defined in the CR manifest. The CRD and consequently, the CR, are extensions of the k8s API and they do not exist in a k8s cluster by default.

You can create your own CRD and CR without deploying them as part of an operator, however that is pretty much useless since there will be no program to watch and deploy the resources corresponding to your CR. The k8s API server will be aware of the new CR and will accept and store objects of this new type, meaning that you will be able to create/read/update/delete this CR using kubectl, however no subsequent primary resources will be created (pods, services, configmaps, etc).

What benefits does an operator bring?

Now, you might wonder, what are the use cases for an operator more specifically? There is not a very precise answer to this since all applications have their own requirements to be run and their own domain knowledge ingrained in them. An operator’s job is to take away the burden of administering an application from the end users and to provide them with well-documented, easy to use CRs which serve as building blocks.

For example, let’s look at the well-known Prometheus operator to understand why an operator is useful and how it streamlines deployment, configuration and application management:

  1. Simplified deployment - deploying Prometheus manually involves creating various resources like ConfigMaps, StatefulSets, and Service definitions. The Operator abstracts these details, allowing you to deploy Prometheus by simply applying CR manifests.
  2. High Availability - the operator simplifies running Prometheus in HA as it exposes the number of replicas in the Prometheus CR spec.
  3. Ease of Upgrades - upgrading Prometheus and its related components is managed centrally through the operator.
  4. Declarative configuration - All aspects of the monitoring stack are defined declaratively using Kubernetes CRDs. Available CRDs include Prometheus for the Prometheus deployment itself, ServiceMonitor which specifies how services should be monitored, automatically generating scrape configuration, PrometheusRule which defines a set of alerting rules. You can see how each CRD is responsible for different Prometheus areas and how the controller abstracts the complexity.

How to deploy them?

Deploying an Operator involves several steps, including creating the deployment, adding the custom resource definitions, and configuring the necessary permissions. A management layer called Operator Lifecycle Manager (OLM) was created to achieve this. We can think of the OLM as an operator responsible for managing operators, therefore it has its own CRDs which are described below:

  1. ClusterServiceVersion - the CSV is the primary metadata resource which describes an operator. It contains general metadata about the operator, operator installation information, CRDs owned by the operator and the CRDs that it is dependent on. Similar to how a deployment contains a pod template, a CSV contains a ‘deployment template’ for the deployment of the operator pod, making sure that the operator is recreated if the pod is deleted by mistake.
  2. CatalogSource - contains information for accessing a repository of Operators.
  3. Subscription - a subscription is created to install and update the Operators that OLM provides. A subscription is made to a channel, which is a stream of Operator versions.
  4. InstallPlan - a subscription creates an InstallPlan which describes the full list of resources that OLM will create to satisfy the CSV’s resource requirements.
  5. OperatorGroup - operator tenancy is controlled through the OperatorGroup. This sets the scope of an operator to either a namespace, or to a cluster. I prefer to install the OLM via the operator-sdk CLI.

After you have OLM installed, you will notice a new namespace, called olm. This namespace contains all the k8s resources responsible for the OLM. If you change the context to the olm namespace and check the available CatalogSource, you will see the below:

kubectl get catalogsources.operators.coreos.com 
NAME                    DISPLAY               TYPE   PUBLISHER        AGE
operatorhubio-catalog   Community Operators   grpc   OperatorHub.io   18m

OperatorHub is the go-to repository for operators, at the time of writing it contains 363 operators:

kubectl get packagemanifests.packages.operators.coreos.com --no-headers | wc -l
     363

Let’s install the etcd operator to explore some of the OLM’s CRDs.

The etcd operator will be scoped to the default namespace only. To achieve this we will need to apply the below OperatorGroup manifest:

apiVersion: operators.coreos.com/v1alpha2
kind: OperatorGroup
metadata:
  name: default-og
  namespace: default
spec:
  targetNamespaces:
  - default

A subscription triggers the installation of an operator. In order to create a subscription, you need to know to which channel to subscribe. This information is available by running the command below or by going to the relevant docs.

kubectl describe packagemanifests.packages.operators.coreos.com etcd

We will pick the ‘singlenamespace-alpha’ channel and apply the manifest below:

apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: etcd-subscription
  namespace: default 
spec:
  name: etcd 
  source: operatorhubio-catalog 
  sourceNamespace: olm
  channel: singlenamespace-alpha 

After the subscription is applied, a ClusterServiceVersion resource is created:

kubectl get csv -n default
NAME                  DISPLAY   VERSION   REPLACES              PHASE
etcdoperator.v0.9.4   etcd      0.9.4     etcdoperator.v0.9.2   Succeeded

In turn, the CSV creates the etcd-operator deploymemt, as seen below:

kubectl get deployment -n default
NAME            READY   UP-TO-DATE   AVAILABLE   AGE
etcd-operator   1/1     1             1           5m12s

The relationship between the CSV and the deployment:

kubectl get deployment/etcd-operator -n default -o yaml
...
ownerReferences:
- apiVersion: operators.coreos.com/v1alpha1
  blockOwnerDeletion: false
  controller: false
  kind: ClusterServiceVersion
  name: etcdoperator.v0.9.4
...

The pods for the operator itself:

kubectl get pods -n default
NAME                            READY   STATUS    RESTARTS   AGE
etcd-operator-b3or989rf-it24b   3/3     Running   0          7m14s

In order to delete the current operator deployment, you need to delete the CSV responsible for deploying the operator:

kubectl delete csv/etcdoperator.v0.9.4
clusterserviceversion.operators.coreos.com "etcdoperator.v0.9.4" deleted

You will also need to delete the subscription:

kubectl delete subscription/etcd-subscription
subscription.operators.coreos.com "etcd-subscription" deleted

Hopefully this blog post clarified what an operator is, its purpose and how to deploy one. One of the following posts will focus on writing an operator using the operator-sdk framework. Happy learning!