Read-only by default.
The collector is a DaemonSet with zero write permissions on your cluster. It cannot mutate a single resource. Every enforcement action — eviction, HPA cap, node cordon — runs through a CRD that you author and apply yourself, gated by an annotation no automated process can set on your behalf.
You ship our agent because the worst outcome is that it doesn't see something. The worst outcome of a write- capable agent is that it does the wrong thing at scale.
Reversible inside the cooldown.
Every action this product takes captures the prior state before it touches anything. kubehero undo <audit-id> replays the original spec back into the cluster, idempotently, in less time than it took to fire. Default cooldown is ten minutes. You can extend it; you cannot shorten it below the time it takes to scrape your dashboard and assemble a war room.
Open at the seams that matter.
Apache 2.0 collector, CLI, proto, cost model. BUSL 1.1 on the control plane, operator, pricing engine, dashboard. The former is what you fork or replace if our roadmap diverges from your needs. The latter is what we sell. We do not blur the line.
Every CRD is a public schema. Every metric we emit is in Prometheus format. Every audit row is exportable to webhook, syslog, or S3. If we go away tomorrow, your data does not.
Enterprise at the seams that don't.
mTLS by default on cloud. RBAC enforced at the RPC layer, not bolted on. OIDC against your IdP, signature-verified via JWKS — not just iss/aud presence. Air-gap installable; container images mirror cleanly to private registries. The audit log is HMAC-signed so a downstream SIEM can detect tampering.
None of this is differentiation. It's the floor. We build it because the buyers we want refuse to discuss the ceiling without it.
Cost is the lens, not the product.
FinOps tools that only show dollars become wallpaper. Cost is most useful when it's joined to something operators already act on: a CVE that's expensive to patch, a workload that's also leaking memory, a node that's both underutilised and out of compliance. We treat dollars as one signal among several, ranked by impact, not by decimals.
Boring beats clever.
We pick boring tools that have been load-tested at scale for years. Postgres for the durable state. ClickHouse for the time-series. Connect-RPC over HTTP/2 for the wire. Helm for distribution. cert-manager for certificates. Cosign + Syft for supply chain.
None of these are surprising choices. They are what your platform team already runs, which means our value to your buyer is what we add on top — not what we ask them to adopt.
The bar is shipping.
Pre-launch is for getting alignment with people who run production at five thousand nodes. Public-facing, that looks like quiet weeks. Internally, every quiet week is a week where we did not promise something we couldn't deliver. We show up with software that works, not slides.
Built by DevOps & FinOps engineers, for the teams who run production every day.