Every row here came out of a real design-partner call. If yours isn't listed, open a discussion.

How accurate is your cost attribution compared to our billing record?

The agent reports at 1-second resolution with cgroup-accurate CPU attribution via eBPF. We reconcile against the cloud billing export (AWS Cost Explorer, GCP Billing Export, Azure Cost Management) on a nightly basis. Our internal accuracy threshold is ±1% of the billing record over a 30-day window after mid-month reservation replay. Most FinOps tools that run on cadvisor land at ±10–15%.

What's the agent's overhead?

Target: under 0.5% CPU and under 50 MiB RSS per node. We benchmark against this on every PR. Typical production measurement on a 4-vCPU / 16-GiB node: 0.12% CPU, 38 MiB RSS, 0.4 Mbps out.

Do you need a GPU driver modification?

No. We consume DCGM Exporter, which is NVIDIA's own library and runs as a separate DaemonSet or in your collector sidecar. For MIG slices, we read the partition table via DCGM's standard API. For AMD, DCGM is absent; we read rocm-smi. For Google TPUs, cloudtpu.googleapis.com metrics.

Do you require privileged containers?

The agent requests hostPID: true to attribute CPU to pod cgroups. It does not run as root, does not need privileged: true, and does not use any syscall beyond the standard K8s runtime. If your security team disallows hostPID, enable the cadvisor fallback — you lose 1s resolution, keep everything else.

How does KubeHero handle mid-month Savings Plans?

When a Savings Plan / Reserved Instance / Committed-Use Discount is purchased mid-month, we reprocess the affected time range: the pricing engine emits a pricing.commitment.activated event, the control plane enqueues a ClickHouse replay, and every pod-second in scope gets re-priced against the new effective rate. Historical numbers in your dashboard update within minutes. A timeline event records what was restated and why. See Concepts · Retroactive cost for the full machinery.

Can we run KubeHero fully air-gapped?

Yes. The values.airgap.yaml overlay disables all outbound traffic, and every image can be mirrored to your internal registry. Pricing catalog snapshots can be imported via kubehero pricing import --from snapshot.json so you don't need to reach the public pricing APIs. See Production · Air-gap install.

What happens if the KubeHero control plane goes down?

The agent keeps collecting metrics locally (up to 60s in-memory buffer). The operator keeps reconciling local CRDs. Enforcement continues — BudgetPolicy and CeilingPolicy objects are evaluated in-cluster by the operator even when the central control plane is unreachable. Dashboard queries degrade to "cached data" mode with a clear staleness indicator. See Production · High availability.

Do you mutate our workloads?

Only under a RightsizingPolicy you apply, with mode: apply, or through a BudgetPolicy / CeilingPolicy that you have armed via kubehero cap --arm (or the dashboard toggle). Every action is reversible via kubehero undo <audit-id> within the cooldown window. The default is observe + recommend.

How is this open-source exactly?

Apache 2.0 — agent, CLI, collector, cost-model library, Protobuf schemas. Everything that runs in your cluster collecting telemetry.
BSL 1.1 → Apache 2.0 after 3 years — control plane, operator, pricing engine, dashboard. The orchestration brain is commercial during the first 3 years, then auto-opens.

Customers with compliance requirements can audit every line of what runs in their cluster. The value layer has a commercial license during the ramp, then becomes OSS after three years.

What's the smallest install that makes sense?

A single-cluster Cloud install: just the agent, plus our hosted control plane. Under 25 nodes, it's free forever. Helm install takes 90 seconds; first scan takes under 2 minutes. See Quickstart.

What about our existing Kubecost / OpenCost install?

Two paths:

Coexist — our agent runs alongside; you see both tools' numbers and compare. Most design partners do this for 2–4 weeks.
Import their allocation rules — kubehero import opencost --from <url> pulls your existing label-based allocation rules so teams don't have to relearn a new chargeback model.

You can run both indefinitely. We aren't trying to kick out another tool you like — we're offering a different accuracy tier and a policy surface they don't have.

Do you support our identity provider?

If it speaks OIDC, yes. The chart ships Dex as the proxy — connectors for Okta, Azure AD, Google Workspace, GitHub, GitLab, LDAP, and generic OIDC are in the standard Dex distribution. For SaaS-hosted KubeHero Cloud, we use WorkOS, which covers SSO + SCIM for every major enterprise IdP. See Integrations · Identity.

How do you handle multi-cluster?

One control plane, many clusters. See Production · Federation. You register each cluster with kubehero cluster add, get a per-cluster mTLS cert, and drop that into the edge cluster's agent Helm install. Policies written in the hub replicate to every matched cluster via label-based scope selectors.

What's your pricing, concretely?

Cloud — $10 per node per month, first 25 nodes free. No seat tax. No limits on users or clusters.
Self-hosted · Free tier — Apache 2.0 components only, 3-cluster / 7-day retention limits. BSL components require the Enterprise license at scale.
Self-hosted · Enterprise — BSL 1.1 commercial license; unlimited scale, SSO/SCIM/RBAC, audit export, federation. Custom pricing per footprint.

Who's behind this?

A devops / FinOps / HPC engineer with 15+ years on Kubernetes and ML infra. Design partners today are operators running real multi-cloud AKS / GKE / EKS footprints with GPU fleets. We prefer small, hands-on engagements over enterprise-sales theatre.

Is there a SLA?

Not during pre-launch. At GA (Q4 2026 planned), Cloud customers get 99.9% on the control-plane API and 99.95% on telemetry ingest. Self-hosted is customer-operated — our uptime SLA reflects software support, not your cluster's health.

FAQ