Step-by-step Grafana Installation on GKE with Terraform and Helm

Get Grafana up and running on Google Kubernetes Engine using Terraform and Helm in just a few simple steps!

9 min readJul 19, 2024

In this guide, we will provision a Kubernetes cluster on Google Kubernetes Engine (GKE) using Terraform, and subsequently use Helm to install a sample application called Podtato Head along with Grafana and Prometheus for monitoring purposes.

You can check the project repo here.

Prerequisites:

Terraform installed on your machine
Helm installed on your machine
An active Google Cloud Platform (GCP) account with GKE enabled

Step 1: Provision the Kubernetes Cluster on GKE

We will be using a Terraform module for this part so we don’t have to think much about it and we just focus more on the monitoring installation part of the article.

Terraform modules are a way of reusing others people Terraform configurations so we just need to give it the specific variables.

We will begin creating the files we will need:

mkdir deployment && cd deployment
touch main.tf providers.tf variables.tf version.tf terraform.tfvars

Providers.tf
Because the documentation of the Terraform module say so, we are not going to be using the GCP provider this time.

provider "kubernetes" {
  host                   = "https://${module.gke.endpoint}"
  token                  = data.google_client_config.default.access_token
  cluster_ca_certificate = base64decode(module.gke.ca_certificate)
}

provider "helm" {
  kubernetes {
    config_path = "~/.kube/config"
  }
}

Version.tf

terraform {
  required_providers {
    helm = {
      source  = "hashicorp/helm"
      version = "2.14.0"
    }
  }
}

Variables.tf
You can add more variables, obviously. For effects of this tutorial, we will keep it really simple.

variable "project_id" {
  type        = string
  description = "GCP Project ID"
}

variable "cluster_name" {
  type        = string
  description = "Cluster name"
}

Main.tf

data "google_client_config" "default" {}

resource "google_service_account" "service_account" {
  account_id   = "service-account-id"
  display_name = "Monitoring Medium Service Accout"
  project      = var.project_id
}

resource "google_project_iam_member" "member-role" {
  depends_on = [google_service_account.service_account]
  for_each = toset([
    "roles/iam.serviceAccountUser",
    "roles/iam.serviceAccountAdmin",
    "roles/container.developer",
    "roles/container.clusterAdmin",
    "roles/compute.viewer"
  ])
  role    = each.key
  member  = "serviceAccount:${google_service_account.service_account.email}"
  project = var.project_id
}

resource "google_compute_network" "medium-monitoring-network" {
  name                    = "medium-monitoring-network"
  auto_create_subnetworks = false
  project                 = var.project_id
}

resource "google_compute_subnetwork" "medium-monitoring-subnetwork" {
  name          = "medium-monitoring-subnetwork"
  project       = var.project_id
  ip_cidr_range = "10.2.0.0/16"
  region        = "us-central1"
  network       = google_compute_network.medium-monitoring-network.id

  secondary_ip_range {
    range_name    = "us-central1-01-gke-01-pods"
    ip_cidr_range = "10.3.0.0/16"
  }

  secondary_ip_range {
    range_name    = "us-central1-01-gke-01-services"
    ip_cidr_range = "10.4.0.0/20"
  }
}

# Docs:
# https://registry.terraform.io/modules/terraform-google-modules/kubernetes-engine/google/latest/submodules/private-cluster
module "gke" {
  depends_on                 = [google_project_iam_member.member-role]
  deletion_protection        = false
  source                     = "terraform-google-modules/kubernetes-engine/google//modules/private-cluster"
  project_id                 = var.project_id
  name                       = var.cluster_name
  region                     = "us-central1"
  zones                      = ["us-central1-a"]
  network                    = google_compute_network.medium-monitoring-network.name
  subnetwork                 = google_compute_subnetwork.medium-monitoring-subnetwork.name
  ip_range_pods              = "us-central1-01-gke-01-pods"
  ip_range_services          = "us-central1-01-gke-01-services"
  http_load_balancing        = false
  network_policy             = false
  horizontal_pod_autoscaling = false
  filestore_csi_driver       = false
  enable_private_endpoint    = false
  enable_private_nodes       = false
  master_ipv4_cidr_block     = "10.0.0.0/28"
  dns_cache                  = false

  node_pools = [
    {
      name               = "default-node-pool"
      machine_type       = "e2-small"
      node_locations     = "us-central1-b,us-central1-c"
      min_count          = 3
      max_count          = 10
      local_ssd_count    = 0
      spot               = false
      disk_size_gb       = 30
      disk_type          = "pd-standard"
      image_type         = "COS_CONTAINERD"
      enable_gcfs        = false
      enable_gvnic       = false
      logging_variant    = "DEFAULT"
      auto_repair        = false
      auto_upgrade       = true
      service_account    = google_service_account.service_account.email
      preemptible        = true
      initial_node_count = 1
      accelerator_count  = 0
    },
  ]

  node_pools_oauth_scopes = {
    all = [
      # "https://www.googleapis.com/auth/logging.write",
      # "https://www.googleapis.com/auth/monitoring",
      "https://www.googleapis.com/auth/cloud-platform",
      "https://www.googleapis.com/auth/userinfo.email"
    ]
  }

  node_pools_labels = {
    all = {}

    default-node-pool = {
      default-node-pool = true
    }
  }

  node_pools_metadata = {
    all = {}

    default-node-pool = {
      node-pool-metadata-custom-value = "my-node-pool"
    }
  }

  node_pools_taints = {
    all = []

    default-node-pool = [
      {
        key    = "default-node-pool"
        value  = true
        effect = "PREFER_NO_SCHEDULE"
      },
    ]
  }

  node_pools_tags = {
    all = []

    default-node-pool = [
      "default-node-pool",
    ]
  }
}

What we basically did here is create a network and subnetwork which the cluster is going to need and a service account with proper permissions so we can install applications in it using Terraform.

If you read the documentation of the module (which I think is a must everytime) you will see we are using the basic example with some minor changes such as names and machine types. We don’t need any accelerator as we are not going to deploy any AI apps or machine learning pipelines.

Once you’ve written your variables on the file terraform.tfvars you can run

terraform plan 
terraform apply -auto-approve

This will deploy your cluster.

Notice that we are going to save the tfstate locally.

Run gcloud container clusters get-credentials so your local machine knows which cluster you are using:

> gcloud container clusters get-credentials <your-cluster-name> --region <your-region>

Fetching cluster endpoint and auth data.
kubeconfig entry generated for <your-cluster-name>.

Step 2: Install the application using Helm

In this step we will be installing the app with Helm, which is quite simple.

Notice that in this step we will not be using Terraform as I want to keep the tutorial the simplest as possible

The app we are going to be installing is a dummy app created by Google named “Online Boutique”. You can check the documentation by clicking on its name.

We are only going to need the files inside the directory helm-chart so go ahead and copy them.

Once that is done, you can run

helm install <name-of-chart> <directory-of-chart>

Something like helm install onlineboutique ./microservices-demo/helm-chart should work.

Once that is done , you should be able to see the pods being created on your cluster:

> kubectl get po

NAME                                     READY   STATUS    RESTARTS   AGE
adservice-77c7c455b7-chpn5               0/1     Pending   0          15s
cartservice-6dc9c7b4f8-ldh5c             0/1     Pending   0          15s
checkoutservice-5f9954bf9f-bwt5l         1/1     Running   0          15s
currencyservice-84cc8dbfcc-p4qc7         0/1     Pending   0          14s
emailservice-5cc954c8cc-cgqz2            0/1     Running   0          14s
frontend-747bdf8f6b-nv5dr                0/1     Pending   0          14s
loadgenerator-6497c9f95c-kkklq           0/1     Pending   0          15s
paymentservice-55646bb857-h22j5          0/1     Pending   0          14s
productcatalogservice-5dbf689769-7vbhl   1/1     Running   0          15s
recommendationservice-5846958db7-q5clh   0/1     Pending   0          14s
redis-cart-7d6cd8794-2bk85               1/1     Running   0          14s
shippingservice-67989cd745-n8dl6         1/1     Running   0          15s

And, if you run kubectl get svc frontend-external you can see the public IP

> kubectl get svc frontend-external

NAME                TYPE           CLUSTER-IP   EXTERNAL-IP    PORT(S)        AGE
frontend-external   LoadBalancer   10.4.13.76   35.224.166.8   80:31845/TCP   65s

Now we can access the app.

Step 3: Installing Grafana and Prometheus using Helm and Terraform

We are going to install Grafana and Prometheus with Helm and Terraform. For that, we will need to create a new file that we can call monitoring-apps.tf inside our deployment folder.

This file is quite straight forward:

resource "kubernetes_namespace" "monitoring" {
  depends_on = [module.gke]
  metadata {
    name = "monitoring"
  }
}

resource "helm_release" "grafana" {
  name       = "grafana"
  repository = "https://grafana.github.io/helm-charts"
  chart      = "grafana"
  namespace  = kubernetes_namespace.monitoring.metadata[0].name

  set {
    name  = "adminUser"
    value = "admin"
  }

  set {
    name  = "adminPassword"
    value = "admin"
  }
}

resource "helm_release" "kube-prometheus" {
  name       = "kube-prometheus-stackr"
  namespace  = kubernetes_namespace.monitoring.metadata[0].name
  repository = "https://prometheus-community.github.io/helm-charts"
  version    = "25.24.1"
  chart      = "prometheus"
}

As you can see, the only thing that we do apart from installing Grafana and Prometheus is creating a new namespace in which these apps are going to run.

Note that both Grafana and Prometheus needs a Persistent Volume and a Persisten Volume Claim eachin order to maintain the data alive even if both apps crashes. For effects of this tutorial, this is not configured for Grafana but it comes with Prometheus (it will get deleted with Terraform though).

After this, we can run terraform apply -auto-approve and it will deploy our apps.

> terraform apply -auto-approve

...
Plan: 2 to add, 0 to change, 0 to destroy.
helm_release.kube-prometheus: Creating...
helm_release.grafana: Creating...

Apply complete! Resources: 2 added, 0 changed, 0 destroyed.

Now, if you check the pods running in the namespace monitoring you’ll see this:

> kubectl get po -n monitoring
NAME                                                             READY   STATUS    RESTARTS   AGE
grafana-7746bdcf4b-gjjrx                                         1/1     Running   0          118s
kube-prometheus-stackr-alertmanager-0                            1/1     Running   0          107s
kube-prometheus-stackr-kube-state-metrics-6d585bf4f6-zfmx4       1/1     Running   0          107s
kube-prometheus-stackr-prometheus-node-exporter-954c5            1/1     Running   0          108s
kube-prometheus-stackr-prometheus-node-exporter-d4gpv            1/1     Running   0          108s
kube-prometheus-stackr-prometheus-node-exporter-vt2bm            1/1     Running   0          108s
kube-prometheus-stackr-prometheus-node-exporter-vzr5w            1/1     Running   0          108s
kube-prometheus-stackr-prometheus-node-exporter-w8sgt            1/1     Running   0          108s
kube-prometheus-stackr-prometheus-pushgateway-5d56dbbc4d-cb58q   1/1     Running   0          107s
kube-prometheus-stackr-server-6b97c9984c-jdp6s                   2/2     Running   0          108s

Everything is running smoothly.

But we can’t access our Grafana yet. In order to do so, and for effects of this tutorial only, we will run a port-forward to the Grafana service which will allow us to access our app from our localhost thanks to the tunnel between our machine and the node created by this command.

It is important to note that for a production environment, you should be using some kind of Ingress for accessing your app.

> kubectl port-forward svc/grafana 5000:80 -n monitoring

Forwarding from 127.0.0.1:5000 -> 3000
Forwarding from [::1]:5000 -> 3000

You can use whatever port you want. And now if you access http://localhost:5000 you can enter your Grafana.

Use admin:admin as user and password, then create a new password and voila.

You are in.

But now, we need to connect Prometheus and Grafana. In order to do so, we go to Connections > Add new connection.

We search for Prometheus as the new data source and when we are in the configuring page we search for Connection:

This URL is provided by KubeDNS. Which follow this format:

<service-name>.<namespace-name>.svc.cluster.local

In our case it should be something like this:
http://kube-prometheus-stackr-server.monitoring.svc.cluster.local

Once you’ve done that, you will see this message at the end:

Step 4: Importing a Dashboard

You can build your own dashboard if you want, but this tutorial is focused on installing Grafana with Prometheus so you can have a functioning monitoring setup.

You can go to Dashboard > New > Import you will be prompted to this page:

In our case, we will be copying a prebuilt id from Grafana dashboards and we are going to install this dashboard for monitoring the cluster.

Copy and paste the ID and then Load. You’ll be prompted to another page where you are going to be asked to select a data source. Select prometheus.

Now, in Dashboards you will see the newly created dashboard called Kubernetes Overview and if you did everything correct, you can see data already:

And that’s a wrap!

In this tutorial, we’ve navigated through the process of setting up Grafana and Prometheus on Google Kubernetes Engine (GKE) using Terraform and Helm. Here’s a quick recap of the key steps:

Provisioning the Kubernetes Cluster: We started by using Terraform to create a Kubernetes cluster on GKE. This included configuring the necessary network and subnetwork, setting up a service account, and deploying the cluster with a custom module.

Installing the Sample Application: Next, we used Helm to deploy a sample application, “Online Boutique,” onto our cluster. This step showcased how to install applications with Helm and verify their deployment using kubectl.

Deploying Grafana and Prometheus: We then used Terraform to create a namespace and install Grafana and Prometheus using Helm charts. This step illustrated how to integrate monitoring tools into your Kubernetes cluster.

Accessing Grafana and Connecting to Prometheus: We accessed Grafana locally using port-forwarding and connected it to Prometheus by configuring a data source. We also imported a prebuilt dashboard to visualize metrics from our cluster.

Remember you can check the project repo here.

By following these steps, you’ve learned how to provision a Kubernetes cluster on GKE, deploy applications, and set up a comprehensive monitoring solution with Grafana and Prometheus.

Feel free to experiment with different configurations and extend this setup to suit your needs. If you have any questions, suggestions, or feedback, drop a comment below. And if you found this guide helpful, don’t forget to share, follow, or give it a thumbs-up!