Read-only by default.
The collector is a DaemonSet with zero write permissions on your cluster. It cannot mutate a single resource. Every enforcement action — eviction, HPA cap, node cordon — runs through a CRD that you author and apply yourself, gated by an annotation no automated process can set on your behalf.
You ship our agent because the worst outcome is that it doesn't see something. The worst outcome of a write- capable agent is that it does the wrong thing at scale.
Reversible inside the cooldown.
Every action this product takes captures the prior state before it touches anything. kubehero undo <audit-id> replays the original spec back into the cluster, idempotently, in less time than it took to fire. Default cooldown is ten minutes. You can extend it; you cannot shorten it below the time it takes to scrape your dashboard and assemble a war room.
Open source, all the way down.
Apache 2.0 collector, CLI, proto, and cost model. BUSL 1.1 on the control plane, operator, pricing engine, and dashboard — source-available, self-hostable, free to run. Every line is public. Fork or replace anything that diverges from your needs; nothing is hidden behind a paywall.
Every CRD is a public schema. Every metric we emit is in Prometheus format. Every audit row is exportable to webhook, syslog, or S3. If we go away tomorrow, your data does not.
Hardened where it counts.
mTLS by default. RBAC enforced at the RPC layer, not bolted on. OIDC against your IdP, signature-verified via JWKS — not just iss/aud presence. Air-gap installable; container images mirror cleanly to private registries. The audit log is HMAC-signed so a downstream SIEM can detect tampering.
None of this is differentiation. It's the floor — what any team running production should expect from software in the hot path of their cluster.
Cost is the lens, not the product.
FinOps tools that only show dollars become wallpaper. Cost is most useful when it's joined to something operators already act on: a CVE that's expensive to patch, a workload that's also leaking memory, a node that's both underutilised and out of compliance. We treat dollars as one signal among several, ranked by impact, not by decimals.
Boring beats clever.
We pick boring tools that have been load-tested at scale for years. Postgres for the durable state. ClickHouse for the time-series. Connect-RPC over HTTP/2 for the wire. Helm for distribution. cert-manager for certificates. Cosign + Syft for supply chain.
None of these are surprising choices. They are what your platform team already runs, which means our value is what we add on top — not what we ask you to adopt.
The bar is shipping.
We show up with software that works, not slides. Every release lists what changed and who'll feel it. We don't ship a feature until it survives a real cluster, and we don't promise what we can't deliver. It's open source — you can read every line and run it yourself today.
Built by DevOps & FinOps engineers, for the teams who run production every day.