FAQ
Common questions about running KubeHero.
Common questions from operators running KubeHero. If yours isn't listed, open a discussion.
How accurate is your cost attribution compared to our billing record?
The agent reports at 1-second resolution with cgroup-accurate CPU attribution via eBPF. We reconcile against the cloud billing export (AWS Cost Explorer, GCP Billing Export, Azure Cost Management) on a nightly basis. Our internal accuracy threshold is ±1% of the billing record over a 30-day window after mid-month reservation replay. Most FinOps tools that run on cadvisor land at ±10–15%.
What's the agent's overhead?
Target: under 0.5% CPU and under 50 MiB RSS per node. We benchmark against this on every PR. Typical production measurement on a 4-vCPU / 16-GiB node: 0.12% CPU, 38 MiB RSS, 0.4 Mbps out.
Do you need a GPU driver modification?
No. We consume DCGM Exporter, which is NVIDIA's own library and runs as a separate DaemonSet or in your collector sidecar. For MIG slices, we read the partition table via DCGM's standard API. For AMD, DCGM is absent; we read rocm-smi. For Google TPUs, cloudtpu.googleapis.com metrics.
Do you require privileged containers?
The agent requests hostPID: true to attribute CPU to pod cgroups. It does not run as root, does not need privileged: true, and does not use any syscall beyond the standard K8s runtime. If your security team disallows hostPID, enable the cadvisor fallback — you lose 1s resolution, keep everything else.
How does KubeHero handle mid-month Savings Plans?
When a Savings Plan / Reserved Instance / Committed-Use Discount is purchased mid-month, we reprocess the affected time range: the pricing engine emits a pricing.commitment.activated event, the control plane enqueues a ClickHouse replay, and every pod-second in scope gets re-priced against the new effective rate. Historical numbers in your dashboard update within minutes. A timeline event records what was restated and why. See Concepts · Retroactive cost for the full machinery.
Can we run KubeHero fully air-gapped?
Yes. The values.airgap.yaml overlay disables all outbound traffic, and every image can be mirrored to your internal registry. Pricing catalog snapshots can be imported via kubehero pricing import --from snapshot.json so you don't need to reach the public pricing APIs. See Production · Air-gap install.
What happens if the KubeHero control plane goes down?
The agent keeps collecting metrics locally (up to 60s in-memory buffer). The operator keeps reconciling local CRDs. Enforcement continues — BudgetPolicy and CeilingPolicy objects are evaluated in-cluster by the operator even when the central control plane is unreachable. Dashboard queries degrade to "cached data" mode with a clear staleness indicator. See Production · High availability.
Do you mutate our workloads?
Only under a RightsizingPolicy you apply, with mode: apply, or through a BudgetPolicy / CeilingPolicy that you have armed via kubehero cap --arm (or the dashboard toggle). Every action is reversible via kubehero undo <audit-id> within the cooldown window. The default is observe + recommend.
How is this open-source exactly?
KubeHero ships under both LICENSE-APACHE-2.0 and LICENSE-BSL-1.1 (Apache 2.0 / BSL 1.1):
- Apache 2.0 — agent, CLI, collector,
cost-modellibrary, Protobuf schemas. Everything that runs in your cluster collecting telemetry. - BSL 1.1 — control plane, operator, pricing engine, dashboard.
Every line of what runs in your cluster is auditable. Source is at github.com/kubehero-io/platform.
What's the smallest install that makes sense?
A single-cluster install: just the agent plus a control plane in the same cluster. Helm install takes 90 seconds; first scan takes under 2 minutes. See Quickstart.
What about our existing Kubecost / OpenCost install?
Two paths:
- Coexist — our agent runs alongside; you see both tools' numbers and compare. Most teams do this for 2–4 weeks.
- Import their allocation rules —
kubehero import opencost --from <url>pulls your existing label-based allocation rules so teams don't have to relearn a new chargeback model.
You can run both indefinitely. We aren't trying to kick out another tool you like — we're offering a different accuracy tier and a policy surface they don't have.
Do you support our identity provider?
If it speaks OIDC, yes. The chart ships Dex as the proxy — connectors for Okta, Azure AD, Google Workspace, GitHub, GitLab, LDAP, and generic OIDC are in the standard Dex distribution. See Integrations · Identity.
How do you handle multi-cluster?
One control plane, many clusters. See Production · Federation. You register each cluster with kubehero cluster add, get a per-cluster mTLS cert, and drop that into the edge cluster's agent Helm install. Policies written in the hub replicate to every matched cluster via label-based scope selectors.
How much does it cost to run?
KubeHero is open source and self-hosted — you run it on your own infrastructure. There's no per-node fee, no seat tax, and no limits on users or clusters. Your only cost is the compute the services consume in your cluster. SSO/SCIM/RBAC, audit export, and federation are all included features.
Is there a SLA?
KubeHero is customer-operated software you run yourself — there's no hosted service and no uptime SLA. Support is community-based via GitHub Discussions.