Extending OpenShift Monitoring: Exporting Metrics and Building Custom Dashboards

OpenShift ships with a production-grade monitoring stack — Prometheus, Thanos, and a built-in Observe console — ready to go from day one. This is a quick why and how-to for two ways to extend it: pulling metrics into a CSV and deploying a custom Grafana instance you can actually edit.

Why This Matters

The built-in dashboards are managed by the Cluster Monitoring Operator (CMO), which keeps them stable and consistent across upgrades. That’s the right behavior for platform infrastructure — but it means they’re not yours to modify, and the data inside them isn’t easy to export.

That gap matters more than people realize. Capacity planning, chargeback reporting, compliance exports, custom application dashboards — these are everyday asks that fall outside what the platform monitoring stack is designed to handle. Most teams hit this and assume it’s a dead end.

It isn’t. OpenShift exposes the full Thanos Querier API to any authorized client. You can query it directly with a Python script and get a CSV, or deploy your own Grafana instance alongside the platform one and build whatever dashboards your team needs — all without touching a single platform component.

The Steps

Approach 1 — Query Thanos directly and export a CSV

Confirm you’re logged into the cluster with oc
Run query_thanos.py with your PromQL query, time range, and output file
Open the CSV

Approach 2 — Deploy a custom Grafana instance

Create a namespace and service account
Create a long-lived token secret for that service account
Install the Grafana Operator from OperatorHub
Deploy a Grafana instance
Create a datasource pointing at the Thanos Querier
Retrieve your Grafana credentials and log in
Build dashboards and export data from the UI

How To Do It

Approach 1: Query Thanos Directly → CSV

A Python script authenticates against the Thanos Querier route using your existing oc session, runs a PromQL range query, and writes the results to a CSV file. No UI, no intermediate steps.

Prerequisites:

oc CLI installed and logged in (oc whoami should return your username)
Python 3.10+
pip install requests

Script: query_thanos.py

Run it:

python query_thanos.py \
  --query 'sum by (namespace) (container_memory_working_set_bytes{container!=""})' \
  --days 7 \
  --output memory_by_namespace.csv

What it does under the hood:

At its core, the script is making one authenticated HTTP request to the Thanos query API. You can see the same thing manually with two oc commands and a curl:

# Get the external Thanos route URL
export THANOS_URL=$(oc get route thanos-querier -n openshift-monitoring -o jsonpath='{.spec.host}')

# Get your bearer token from the active oc session
export TOKEN=$(oc whoami -t)

# Set a time range (Unix epoch)
export END_TIME=$(date +%s)
export START_TIME=$(date -d "7 days ago" +%s)

# Hit the Thanos range query API directly
curl -k -H "Authorization: Bearer $TOKEN" \
  "https://$THANOS_URL/api/v1/query_range" \
  --data-urlencode 'query=sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate) by (pod)' \
  --data-urlencode "start=$START_TIME" \
  --data-urlencode "end=$END_TIME" \
  --data-urlencode 'step=3600s' > my_metrics.json

The Python script does exactly this — gets the route, grabs the token, calls the same endpoint — then goes one step further: it parses the JSON response and pivots it into a readable CSV instead of leaving it as raw JSON.

Example output:

DateTime,default,kube-system,openshift-monitoring
2025-03-24T00:00:00.000Z,1073741824,536870912,2147483648
2025-03-24T00:15:00.000Z,1073741824,536870912,2147483648

Options:

Flag	Description
`--query` / `-q`	PromQL query (required)
`--days` / `-d`	How many days of history to fetch (required)
`--step` / `-s`	Interval between data points, e.g. `15m`, `1h` (optional, auto-calculated)
`--label` / `-l`	Label to use as column headers (optional, auto-detected)
`--output` / `-o`	Output CSV file path (default: `output.csv`)

Access note: Your oc user needs the cluster-monitoring-view cluster role to query Thanos across all namespaces. If you can already see metrics in the OpenShift web console, you likely already have it.

Approach 2: Deploy Your Own Grafana Instance

A full Grafana UI — editable dashboards, panel-level CSV export, ad-hoc PromQL — running in its own namespace and pointed at the same Thanos backend as the platform stack.

Step 1: Create a Namespace and Service Account

📄 1-configureNamespaceSA.sh

oc create namespace my-custom-metrics
oc project my-custom-metrics

oc create sa custom-grafana-sa

oc adm policy add-cluster-role-to-user cluster-monitoring-view -z custom-grafana-sa

The cluster-monitoring-view role allows the service account to read metrics across all namespaces via Thanos.

Step 2: Create a Long-Lived Token Secret

📄 2-SA-secret.yaml

oc apply -f 2-SA-secret.yaml

apiVersion: v1
kind: Secret
metadata:
  name: custom-grafana-token
  annotations:
    kubernetes.io/service-account.name: custom-grafana-sa
type: kubernetes.io/service-account-token

OCP 4.11+ deprecated oc sa get-token. A secret of type kubernetes.io/service-account-token annotated with the service account name is the supported replacement — OpenShift automatically populates the token key.

Step 3: Install the Grafana Operator

📄 3-install-Grafana.yaml

oc apply -f 3-install-Grafana.yaml

apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: grafana-operator-group
  namespace: my-custom-metrics
spec:
  targetNamespaces:
  - my-custom-metrics
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: grafana-operator
  namespace: my-custom-metrics
spec:
  channel: v5
  installPlanApproval: Automatic
  name: grafana-operator
  source: community-operators
  sourceNamespace: openshift-marketplace

This installs the community Grafana Operator from OperatorHub — the path Red Hat’s own documentation points to for custom Grafana on OCP 4. Wait for the operator pod before continuing:

oc get pods -n my-custom-metrics -w

Step 4: Deploy a Grafana Instance

📄 4-create-Grafana-instance.yaml

oc apply -f 4-create-Grafana-instance.yaml

apiVersion: grafana.integreatly.org/v1beta1
kind: Grafana
metadata:
  name: custom-grafana
  namespace: my-custom-metrics
  labels:
    app: grafana
spec:
  route:
    spec: {}
  config:
    log:
      mode: "console"
    auth:
      disable_login_form: "false"

route: spec: {} tells the operator to create an OpenShift Route automatically. The app: grafana label is how the datasource in the next step knows which instance to attach to.

Step 5: Connect Grafana to the Thanos Querier

📄 5-create-Grafana-datasource.yaml

oc apply -f 5-create-Grafana-datasource.yaml

apiVersion: grafana.integreatly.org/v1beta1
kind: GrafanaDatasource
metadata:
  name: openshift-monitoring-datasource
  namespace: my-custom-metrics
spec:
  instanceSelector:
    matchLabels:
      app: grafana
  valuesFrom:
    - targetPath: "secureJsonData.httpHeaderValue1"
      valueFrom:
        secretKeyRef:
          name: "custom-grafana-token"
          key: "token"
  datasource:
    name: OpenShift Thanos
    type: prometheus
    access: proxy
    url: https://thanos-querier.openshift-monitoring.svc.cluster.local:9091
    isDefault: true
    jsonData:
      tlsSkipVerify: true
      httpHeaderName1: Authorization
    secureJsonData:
      httpHeaderValue1: "Bearer ${token}"

A few things worth understanding:

thanos-querier.openshift-monitoring.svc.cluster.local:9091 is the in-cluster Thanos service. Port 9091 gives access across all namespaces.
valuesFrom pulls the bearer token from the secret in Step 2 and injects it into the Authorization header automatically — no manual token management.
tlsSkipVerify: true is standard for in-cluster service-to-service communication.

Step 6: Get Your Credentials and Log In

📄 6-get-grafana-creds.sh

bash 6-get-grafana-creds.sh

========================================
  Grafana Login Details
========================================
URL:      https://custom-grafana-route-my-custom-metrics.apps.your-cluster.example.com
Username: admin
Password: <generated>
========================================

The Grafana Operator stores generated credentials in a secret. This script retrieves and decodes them.

From here you have a fully editable Grafana instance. Use Explore for ad-hoc PromQL, or Dashboards to build and save views. To export any panel as CSV: panel menu → Inspect → Data → Download CSV.

Which Approach Should You Use?

	Thanos → CSV (Python)	Custom Grafana
Best for	One-off data pulls, automation, CI pipelines	Ongoing dashboards, visual exploration
Setup	Minimal — just `oc` and Python	~10 minutes of YAML
Output	CSV file	Interactive UI + CSV export
Maintenance	None	Operator updates, token rotation