Deploying and monitoring Github Actions self-hosted runners

Introduction

GitHub Actions is a powerful and flexible CI/CD platform that integrates seamlessly with GitHub repositories. The main benefits of Github Actions are listed below:

Native integration: GitHub Actions is built directly into GitHub, providing a seamless experience for setting up and managing workflows without needing external CI/CD tools.
Trigger Workflows on GitHub Events: Workflows can be triggered by various GitHub events such as pushes, pull requests, releases, and more, making it easy to automate processes around your development workflow.
Big community for custom actions: GitHub Actions Marketplace offers a wide range of pre-built actions created by the community, which can be integrated into your workflows to extend functionality without starting from scratch.
Matrices: You can use matrices to run your tests and deployments across multiple environments, configurations, or versions, ensuring comprehensive testing coverage.

GitHub Actions offers hosted runners that are easy to use and configure. However, there are multiple use cases when you need private connectivity to your internal workloads, you have strict data residency requirements or you need an OS not offered by GitHub hosted runners. In such scenarios, self-hosted runners provide an excellent alternative. Another advantage of self-hosted runners is that you do not pay any licensing relating costs, you only cover the costs of the infrastructure you host your runners on.

Deploying the runners

You can host your runner on either a VM or a container. There is an open source project maintained by Philips which deploys runners on VMs and scales to 0 through a serverless control plane. However this requires deploying several ancillary services on AWS as part of a Terraform module.

In my opinion, a simpler approach would be to deploy the runners as containers on a Kubernetes cluster and manage them via the controller developed by GitHub. The ARC controller and listener pods expose metrics which can be scraped by Prometheus. When we deploy the Helm charts we will input metrics-specific values as well.

Let’s run through this process using the steps below:

You need a running Kubernetes cluster. You can use kind, minikube or whatever tool you have for quickly deploying a local cluster. I am using my cli tool for bootstrapping a local cluster on multipass VMs. You also need to install helm, it will be used later to deploy the ARC controller and the runner scale set.
You can setup runners at a repository or organization level. I am setting up these runners at a repository level to run a workflow which updates this website with new posts. For the repository authentication you need to create a classic PAT token with all the repo permissions enabled.
Install the ARC controller using the command and the helm values below:

helm install arc \
    --namespace actions-runner-system \
    --create-namespace \
    -f values-controller.yaml \
    oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set-controller

Contents of values-controller.yaml:

metrics:
  controllerManagerAddr: ":8080"
  listenerAddr: ":8080"
  listenerEndpoint: "/metrics"

podAnnotations:
  prometheus.io/scrape: "true"
  prometheus.io/path: "/metrics"
  prometheus.io/port: "8080"

Before you install the ARC runner scale set you need to create the secret which stores the PAT token created in step 2. The secret name will be the helm value for githubConfigSecret. You can use the script below:

GITHUB_PAT="YOUR_PAT_TOKEN"
SCALE_SET_NAMESPACE="actions-runner-set"
kubectl create namespace "${SCALE_SET_NAMESPACE}"
kubectl create secret generic arc-scale-set-secret \
   --namespace "${SCALE_SET_NAMESPACE}" \
   --from-literal=github_token="${GITHUB_PAT}"

Now you can install the ARC runner scale set using the command and the helm values below:

helm install arc-runner-set \
    --namespace "${SCALE_SET_NAMESPACE}" \
    --create-namespace \
    -f values-scale-set.yaml \
    oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set

Contents of values-controller.yaml:

githubConfigSecret: arc-scale-set-secret
githubConfigUrl: YOUR_REPO_LINK
minRunners: 1
template:
  spec:
    containers:
    - name: runner
      image: docker472874829/github-runners:v1
      command: ["/home/runner/run.sh"]
listenerTemplate:
  metadata:
    annotations:
      prometheus.io/scrape: "true"
      prometheus.io/path: "/metrics"
      prometheus.io/port: "8080"
  spec:
    containers:
    - name: listener

The helm values for the scale set can be found here. One thing you need to be careful about is the listenerTemplate.spec.containers[0] value. Even though you do not need to add any customizations, you need to make sure to at least declare the name of the listener container. If you do not do that, the listener and consequently, the runners, will not be created.

Because I set the minimum number of idle runners to 1, I can see the pod running and waiting to pick up a job:

The runner and the scale set are also displayed in the GitHub repo:

Testing the runners

As shown in the screenshot above, the name of the runner scale set coincides with the name of the Helm release from step 5. In order to run a job on self hosted runners, you need to mention the scale set name in the runs-on field inside the job declaration.

You can find a sample workflow file below used for deploying the infra hosting this website:

name: Infrastructure deployment workflow

on:
    workflow_dispatch:

permissions:
    id-token: write
    contents: read

env:
    AWS_REGION: "eu-west-1"
    TF_VERSION: "1.8.1"

jobs:
    infra-deployment:
        name: Infra deployment
        runs-on: arc-runner-set
        defaults:
          run:
            working-directory: ./infra

        steps:
        
        - name: Checkout repo
          uses: actions/checkout@v4

        - name: Configure AWS credentials
          uses: aws-actions/configure-aws-credentials@v3
          with:
            role-to-assume: arn:aws:iam::471112989739:role/website-deployment-role
            aws-region: ${{ env.AWS_REGION }}

        - name: Install node (needed because we run this on self hosted runner)
          uses: actions/setup-node@v4
          with:
            node-version: latest
            
        - name: Configure Terraform
          uses: hashicorp/setup-terraform@v3
          with:
            terraform_version: ${{ env.TF_VERSION }}

        - name: Terraform init
          run: terraform init

        - name: Terraform plan
          run: terraform plan

        - name: Terraform apply
          run: terraform apply -auto-approve

In the screenshot below you can see a successful run:

Monitoring the runners

The controller and listener pods metrics were enabled earlier. In order to scrape and visualize them, we can deploy and configure the kube-prometheus-stack which will deploy Prometheus and Grafana for us.

You can use the script below to install kube-prometheus-stack, expose Prometheus and Grafana as NodePort services and add a custom podMonitorSelector.

#!/usr/bin/env bash

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

helm install kube-prometheus-stack \
    --namespace monitoring \
    --create-namespace \
    -f values-prometheus.yaml \
    prometheus-community/kube-prometheus-stack

sleep 5
kubectl expose service --namespace monitoring kube-prometheus-stack-prometheus --type=NodePort --target-port=9090 --name=prometheus-node-port-service
kubectl expose service --namespace monitoring kube-prometheus-stack-grafana --type=NodePort --target-port=3000 --name=grafana-node-port-service

The contents of values-prometheus.yaml:

prometheus:
  prometheusSpec:
    podMonitorSelector:
      matchLabels:
        prometheus: "true"

Now that the Prometheus specific CRDs are deployed, we can create the PodMonitors which will register the controller and listener pods as Prometheus targets. Apply the manifest below:

---
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: gha-rs-controller
  namespace: actions-runner-system
  labels:
    prometheus: "true"
spec:
  podMetricsEndpoints:
  - interval: 30s
    targetPort: 8080
    path: /metrics
  namespaceSelector:
    matchNames:
    - actions-runner-system
  selector:
    matchLabels:
      app.kubernetes.io/part-of: gha-rs-controller
---
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: gha-runner-scale-set
  namespace: actions-runner-system
  labels:
    prometheus: "true"
spec:
  podMetricsEndpoints:
  - interval: 30s
    targetPort: 8080
    path: /metrics
  namespaceSelector:
    matchNames:
    - actions-runner-system
  selector:
    matchLabels:
      app.kubernetes.io/part-of: gha-runner-scale-set

After 1-2 minutes, you should see the new targets in the Prometheus Targets UI:

A sample Grafana dashboard showing the number of runners and completed jobs can be seen below:

Alex Stan

Deploying and monitoring Github Actions self-hosted runners

Introduction

Deploying the runners

Testing the runners

Monitoring the runners

Table of Contents