Every metric KubeHero exports, the labels it carries, and the PromQL recording rules we ship.

This is the complete contract the collector's /metrics endpoint publishes. Our PrometheusRule, Grafana dashboards, and every Connect-RPC reader assume these exact names and label sets. A schema change here breaks everything downstream, which is why there's a test in services/collector/internal/metrics/ that pins it.

Labels — the chargeback axis

Every pod-level metric carries these labels. They are the primary grouping dimension for every dashboard and recording rule.

Label	Populated by	Example	Notes
`namespace`	Pod's namespace	`ml-inference`
`pod`	Pod's name	`vectordb-ingress-7b9fc`
`team`	`kubehero.io/team` pod label	`retrieval`	falls back to `chargeback.defaultTeam` value
`cost_center`	`kubehero.io/cost-center` pod label	`ml-platform`	optional BU-level rollup
`nodepool`	cloud-native nodepool label on host node	`aks-nc24ads` · `gke-g2` · `eks-p5`	chart auto-detects; override with `chargeback.nodepoolLabel`
`cloud`	derived from node's provider label	`aws` · `gcp` · `azure`
`region`	node's `topology.kubernetes.io/region`	`us-east-1` · `westeurope`
`cluster`	cluster name from control-plane config	`eks-use1-prod`
`gpu_kind`	only when pod requests GPU	`A100 80GB` · `H100 80GB` · `L4 24GB`	via DCGM device metadata

The team, cost_center, and nodepool labels are the chargeback primary axis. See Chargeback for how they're used.

Series catalogue

Cost (the headline numbers)

`kubehero_pod_cost_usd_per_second` (gauge)

Attributed $/sec for one pod-second of compute. The product of:

the pod's share of its node's resources (blended 50/50 across CPU and memory)
the node's per-second list price from the Pricing engine
the active lifecycle discount (on-demand / spot / savings-plan / committed)

GPU pods additionally attribute their node's GPU cost proportional to their reserved count (or MIG-slice fraction).

sum(rate(kubehero_pod_cost_usd_per_second[5m])) by (team) * 3600

→ "Team's $/hour, averaged over the last 5 minutes."

`kubehero_pod_recoverable_usd_per_second` (gauge)

Portion of pod cost reclaimable via rightsizing: (requested − used) resources, priced out. Equals zero when the pod is already within its p95 + headroom target.

sum(kubehero:team_recoverable_usd:rate1h) by (team)

`kubehero_node_cost_usd_per_hour` (gauge)

Per-hour cost of a node given its SKU + lifecycle + region. Populated by the Pricing engine, refreshed every 6 hours by default.

Labels: node, nodepool, cloud, region, sku, lifecycle.

Resource usage

`kubehero_pod_cpu_millicores` (gauge)

Pod CPU usage in millicores, eBPF-attributed at 1-second resolution. Pairs with the node's allocatable CPU for share calculation.

`kubehero_pod_memory_bytes` (gauge)

Resident set size of the pod's cgroup, in bytes.

GPU (only when pod uses GPUs)

`kubehero_pod_gpu_util_ratio` (gauge)

GPU utilization, 0.0–1.0. For MIG-partitioned devices, this is the utilization of the allocated slice, not the full device.

1 - avg(kubehero_pod_gpu_util_ratio) by (team)

→ "Idle fraction per team — the chargeback penalty for provisioned-but-unused GPU."

`kubehero_pod_gpu_memory_bytes` (gauge)

VRAM usage in bytes, DCGM-reported.

Liveness

`kubehero_up` (gauge)

Value is 1 when the emitting process is healthy. Labeled with service (collector, control-plane, pricing-engine, operator).

up{service="kubehero-collector"} == 0

→ classic Prometheus alerting signal.

Recording rules (shipped in the PrometheusRule)

The chart's templates/prometheusrule.yaml installs these at chart install time. They exist so Grafana queries and alerts don't recompute at read time.

Chargeback

kubehero:pod_cost_usd:rate1m
  = sum(rate(kubehero_pod_cost_usd_per_second[1m]))
    by (namespace, pod, team, cost_center, nodepool, cloud, region, gpu_kind)
    * 60

kubehero:team_cost_usd:rate1h
  = sum(rate(kubehero_pod_cost_usd_per_second[5m])) by (team) * 3600

kubehero:team_cost_usd:rate24h
  = sum(rate(kubehero_pod_cost_usd_per_second[5m])) by (team) * 86400

kubehero:team_cost_usd:rate30d
  = sum(rate(kubehero_pod_cost_usd_per_second[5m])) by (team) * 86400 * 30

kubehero:nodepool_cost_usd:rate1h
  = sum(rate(kubehero_pod_cost_usd_per_second[5m])) by (nodepool, cloud, region) * 3600

kubehero:cost_center_cost_usd:rate1h
  = sum(rate(kubehero_pod_cost_usd_per_second[5m])) by (cost_center) * 3600

kubehero:team_gpu_idle_cost_usd:rate1h
  = sum(
      rate(kubehero_pod_cost_usd_per_second{gpu_kind!=""}[5m])
      * on(pod, namespace) group_left
      (1 - max(kubehero_pod_gpu_util_ratio) by (pod, namespace))
    ) by (team) * 3600

kubehero:team_recoverable_usd:rate1h
  = sum(rate(kubehero_pod_recoverable_usd_per_second[5m])) by (team) * 3600

Alerts

- alert: KubeHeroTeamOverBudgetProjected
  expr: |
    predict_linear(kubehero:team_cost_usd:rate24h[6h], 86400 * 30)
      > on(team) group_left
    kubehero_team_budget_usd
  for: 15m
  labels: { severity: warning, team_scope: "true" }
  annotations:
    summary: Team on track to exceed monthly budget

- alert: KubeHeroGPUIdleExcessive
  expr: kubehero:team_gpu_idle_cost_usd:rate1h > 500
  for: 1h
  labels: { severity: warning }
  annotations:
    summary: Team burning > $500/hr on idle GPUs

Cardinality notes

The label set is intentionally small. We do not emit:

Per-container series (pod-level is the contract).
Per-node kernel telemetry (cluster-level aggregates instead).
Request-level latency (that's APM territory; use your existing OTel setup).

Target cardinality for a 500-node cluster is roughly 50k active series. Most of that is kubehero_pod_cost_usd_per_second at one series per running pod.

If you need finer labels, query ClickHouse directly — that's the intended escape hatch for custom analytics. See Architecture · ClickHouse.

Metrics reference

Labels — the chargeback axis

Series catalogue

Cost (the headline numbers)

`kubehero_pod_cost_usd_per_second` (gauge)

`kubehero_pod_recoverable_usd_per_second` (gauge)

`kubehero_node_cost_usd_per_hour` (gauge)

Resource usage

`kubehero_pod_cpu_millicores` (gauge)

`kubehero_pod_memory_bytes` (gauge)

GPU (only when pod uses GPUs)

`kubehero_pod_gpu_util_ratio` (gauge)

`kubehero_pod_gpu_memory_bytes` (gauge)

Liveness

`kubehero_up` (gauge)

Recording rules (shipped in the PrometheusRule)

Chargeback

Alerts

Cardinality notes

On this page

Metrics reference

Labels — the chargeback axis

Series catalogue

Cost (the headline numbers)

kubehero_pod_cost_usd_per_second (gauge)

kubehero_pod_recoverable_usd_per_second (gauge)

kubehero_node_cost_usd_per_hour (gauge)

Resource usage

kubehero_pod_cpu_millicores (gauge)

kubehero_pod_memory_bytes (gauge)

GPU (only when pod uses GPUs)

kubehero_pod_gpu_util_ratio (gauge)

kubehero_pod_gpu_memory_bytes (gauge)

Liveness

kubehero_up (gauge)

Recording rules (shipped in the PrometheusRule)

Chargeback

Alerts

Cardinality notes

On this page

`kubehero_pod_cost_usd_per_second` (gauge)

`kubehero_pod_recoverable_usd_per_second` (gauge)

`kubehero_node_cost_usd_per_hour` (gauge)

`kubehero_pod_cpu_millicores` (gauge)

`kubehero_pod_memory_bytes` (gauge)

`kubehero_pod_gpu_util_ratio` (gauge)

`kubehero_pod_gpu_memory_bytes` (gauge)

`kubehero_up` (gauge)