KubeHero docs

Chargeback

Attribute every dollar to a team, nodepool, or cost center. Grafana dashboards included.

KubeHero rolls up every pod-second of spend into team, namespace, nodepool, cloud, region, and (for GPUs) gpu_kind dimensions. The chart ships three Grafana dashboards that render those rollups out of the box.

The label convention

KubeHero reads existing Kubernetes labels — no new configuration layer. Tag your workloads with one label and you're done.

Label (default)What it meansOverride via
kubehero.io/teamOwning team — the primary chargeback axischargeback.teamLabel
kubehero.io/cost-centerOptional BU / cost centerchargeback.costCenterLabel
nodepool label (cloud-native)cloud.google.com/gke-nodepool, agentpool, eks.amazonaws.com/nodegroupchargeback.nodepoolLabel

Pods without a team label roll up under the unattributed bucket — unallocated spend stays visible instead of silently dropped.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: vectordb-ingress
  labels:
    kubehero.io/team: retrieval
    kubehero.io/cost-center: ml-platform

The metrics the collector exports

MetricLabelsUnits
kubehero_pod_cost_usd_per_secondnamespace, pod, team, cost_center, nodepool, cloud, region, gpu_kindUSD/sec
kubehero_pod_recoverable_usd_per_secondsameUSD/sec (request minus actual use, priced out)
kubehero_pod_cpu_millicoressamemillicores
kubehero_pod_memory_bytessamebytes
kubehero_pod_gpu_util_ratiosame0..1, GPU pods only
kubehero_node_cost_usd_per_hournode, nodepool, cloud, region, sku, lifecycleUSD/hr

Recording rules shipped with the chart

deploy/helm/kubehero/templates/prometheusrule.yaml installs:

# Monthly projected spend per team
kubehero:team_cost_usd:rate30d

# Hourly burn per nodepool
kubehero:nodepool_cost_usd:rate1h

# GPU idle cost — $ teams spend on GPUs they aren't using
kubehero:team_gpu_idle_cost_usd:rate1h

# Recoverable via rightsizing, per team
kubehero:team_recoverable_usd:rate1h

Grafana dashboards

The chart ships three dashboards as ConfigMaps labeled grafana_dashboard=1 — kube-prometheus-stack's sidecar auto-discovers them.

  • KubeHero — Chargeback by team — hourly rate, 30-day projection, nodepool breakdown, GPU idle cost, ranked workload table
  • KubeHero — Fleet — total spend, recoverable, per-cluster time series
  • KubeHero — GPU panel — utilization heatmap + per-GPU idle cost ranking

Disable any of them in values:

grafana:
  dashboards:
    chargeback: true
    fleet: true
    gpu: false

Budgets + alerts

Pair chargeback with a BudgetPolicy to get alerting on projected overspend. Two alerts ship by default:

  • KubeHeroTeamOverBudgetProjectedpredict_linear over 6h projects the team's 30-day spend past their budget; fires at 15m sustained.
  • KubeHeroGPUIdleExcessive — team burns > $500/hr on idle GPUs for 1h.

See CRD reference for the full policy spec.

Verifying in a fresh cluster

helm install kubehero kubehero/kubehero \
  --namespace kubehero-system --create-namespace \
  --set prometheus.release=kube-prometheus-stack

kubectl -n kubehero-system port-forward svc/kubehero-collector 8081:8081
curl -s localhost:8081/metrics | grep kubehero_pod_cost_usd_per_second | head -5

You should see live team=, nodepool=, cloud= labels flowing within a minute.