Etcd Health Checks for a Control Plane

Last updated: April 17, 2026

This guide walks through the etcdctl commands used to investigate a control plane whose etcd is logging apply request took too long warnings even though the underlying disk looks healthy.

Background: what the warning means

A log line like:

{"level":"warn","caller":"etcdserver/util.go:170","msg":"apply request took too long","took":"105.790347ms","expected-duration":"100ms","prefix":"read-only range ","request":"key:\"/registry/apiextensions.k8s.io/customresourcedefinitions/<example-crd-name>\" ","response":"range_response_count:1 size:7107"}

means the etcd server took longer than its internal 100ms threshold to service a single request. Disk is the most common cause, but the slowness can also come from:

Large values — a single CRD or object that is hundreds of KB.
Fragmentation — dbSize much larger than dbSizeInUse.
A large keyspace — many keys under a single prefix (e.g. thousands of MRs of one kind).
Slow compaction or a CPU-starved leader.
A hot watch — a client opening/closing watches repeatedly.

The commands below help distinguish between those causes.

Prerequisites

kubectl access to the host cluster, including exec into pods in the mxp-<uid>-system namespace.

Each control plane's etcd runs in its own hostcluster namespace named mxp-<uid>-system as a StatefulSet called vcluster-etcd. It runs as a single replica by default and as three replicas when the control plane has HA enabled. Client certs and the CA are stored in the vcluster-certs secret in the same namespace.

Step 0 — Set the host namespace for the control plane

Set MXP_NS to the control plane's hostcluster namespace, of the form mxp-<uid>-system:

MXP_NS=mxp-<uid>-system

Step 1 – Get etcd metrics

In one terminal, port-forward to the etcd pod:

kubectl port-forward -n $MXP_NS <etcd-pod> 2381:2381

In another, output all metrics to a file to share with Upbound:

curl -s http://localhost:2381/metrics > /tmp/etcd-metrics-step-1.txt

Step 2 — Verify the etcd pods are running

kubectl -n "$MXP_NS" get pods -l app=vcluster-etcd -o wide
kubectl -n "$MXP_NS" get statefulset vcluster-etcd

Every replica should be Ready. On an HA control plane, if one of the three is CrashLoopBackOff or Pending, fix that first — an etcd quorum with a missing member slows every write. A non-HA control plane has only one replica.

Check recent restarts:

kubectl -n "$MXP_NS" describe pod vcluster-etcd-0 | \
  sed -n '/Events:/,$p'

If a pod has restarted recently, the best signal is usually in the previous container's logs — they contain the panic, OOM, or slow-apply storm that preceded the restart:

kubectl -n "$MXP_NS" logs vcluster-etcd-0 --previous --tail=500

Step 3 — Open an etcdctl shell inside a running etcd pod

The etcd image has etcdctl on PATH, and the certs are already mounted at /run/config/pki/. Exec into any one member:

kubectl -n "$MXP_NS" exec -it vcluster-etcd-0 -- sh

Inside the pod, set these once so you don't retype the flags:

export ETCDCTL_API=3
export ETCDCTL_CACERT=/run/config/pki/etcd-ca.crt
export ETCDCTL_CERT=/run/config/pki/etcd-server.crt
export ETCDCTL_KEY=/run/config/pki/etcd-server.key
export ETCDCTL_ENDPOINTS=https://vcluster-etcd-0.vcluster-etcd:2379

Quick sanity check:

etcdctl endpoint health

Expected output on a healthy cluster:

127.0.0.1:2379 is healthy: successfully committed proposal: took = 4.35ms

A healthy cluster typically reports well under 20ms here. Anything over 50ms for this one-shot check is worth investigating.

Step 4 — Cluster-wide health and status

--cluster discovers all members from the one endpoint you're connected to. A non-HA control plane has only one member, so the output is a single row — that's expected.

etcdctl endpoint health --cluster --write-out=table
etcdctl endpoint status --cluster --write-out=table
etcdctl member list --write-out=table
etcdctl alarm list

What to look for:

endpoint health — every member must report true and low latency. A healthy cluster is usually well under 20ms here.
endpoint status — shows DB SIZE, IS LEADER, RAFT INDEX, RAFT APPLIED INDEX, ERRORS.
- RAFT INDEX far ahead of RAFT APPLIED INDEX on one member → that member is behind and the leader is stalling for it.
- ERRORS column non-empty → read the value, commonly NOSPACE or CORRUPT.
alarm list — must be empty (no output is good). A NOSPACE alarm makes the cluster read-only until you defrag and alarm disarm.

Step 5 — Measure request latency and throughput

etcdctl check perf runs a synthetic write load and reports whether the cluster hits the reference throughput and latency. This is the quickest way to confirm/deny disk-or-CPU contention is the cause.

Run only when etcd is not already in distress. Before running perf, confirm in Step 4 that alarm list is empty and DB SIZE is nowhere near the quota. If either is true, skip this step — perf writes a batch of keys against an already-loaded cluster and can push it further. Use the load sparingly (--load="s" = small) and never on a cluster actively firing apply request took too long at 500ms+.

# Takes ~60 seconds. Writes a batch of keys into a dedicated prefix
# that it deletes afterwards. See caveat above before running.
etcdctl check perf --load="s"

Three PASS lines followed by a final PASS means the cluster is hitting the reference profile for the chosen load. Any FAIL — and especially a FAIL on "Slowest request took …" — points at disk or CPU contention. If this fails while AWS EBS metrics look fine, contention is more likely CPU, network, or noisy-neighbor on the node than the volume itself.

Step 6 — Inspect the key mentioned in the warning

The warning points at a specific key. Before doing anything else, pick the right key to look at: the one with the largest size: in the slow-request warnings, not necessarily the one that appears first. A 7KB CRD showing up once isn't the problem; a 475KB resource showing up dozens of times per minute is.

Read the warning's two relevant fields:

request:"key:\"…\""            ← which key was read
response:"range_response_count:1 size:475433"   ← how big the response was (bytes)

Scan the last hour of logs and find the top offenders by size:. The key format is always /registry/<api-group-or-core>/<resource>/<name> (or /registry/<resource>/<namespace>/<name> for namespaced core resources).

Example from a real incident:

/registry/applications.argocd.crossplane.io/applications/<name>   size:475433  (~475 KB)

Because the etcd pod doesn't have wc, run these from your workstation and let kubectl exec pipe the bytes back over stdout.

Sensitive data: commands below without --keys-only return the raw value bytes, which for Secret, ConfigMap, and token keys is the plaintext payload. Do not paste this output into tickets, chat, or shared documents. If you need only a size or count, use the --keys-only variants.

First, set KEY and PREFIX to match your warning. These are examples, replace with actual values from logs.

# The exact key from the warning (replace the whole right-hand side).
KEY=/registry/applications.argocd.crossplane.io/applications/<name>

# Everything up to and including the trailing slash before the name.
PREFIX=/registry/applications.argocd.crossplane.io/applications

Then:

# Size (bytes) of the exact key from the warning.
kubectl -n "$MXP_NS" exec vcluster-etcd-0 -- etcdctl \
  --cacert=/run/config/pki/etcd-ca.crt \
  --cert=/run/config/pki/etcd-server.crt \
  --key=/run/config/pki/etcd-server.key \
  get --print-value-only "$KEY" \
  | wc -c

# Number of siblings under the same prefix.
kubectl -n "$MXP_NS" exec vcluster-etcd-0 -- etcdctl \
  --cacert=/run/config/pki/etcd-ca.crt \
  --cert=/run/config/pki/etcd-server.crt \
  --key=/run/config/pki/etcd-server.key \
  get --prefix --keys-only "$PREFIX" \
  | grep -c .

# Approximate total bytes across the whole prefix. This counts the
# bytes of etcdctl's output (keys + values + formatting), not the
# exact on-disk size — it is useful as a relative signal, not a
# precise measurement.
kubectl -n "$MXP_NS" exec vcluster-etcd-0 -- etcdctl \
  --cacert=/run/config/pki/etcd-ca.crt \
  --cert=/run/config/pki/etcd-server.crt \
  --key=/run/config/pki/etcd-server.key \
  get --prefix "$PREFIX" \
  | wc -c

# Dump the list of keys under the prefix to a file on your
# workstation. You can then grep/sort/wc against this file locally
# without hitting the cluster again.
kubectl -n "$MXP_NS" exec vcluster-etcd-0 -- etcdctl \
  --cacert=/run/config/pki/etcd-ca.crt \
  --cert=/run/config/pki/etcd-server.crt \
  --key=/run/config/pki/etcd-server.key \
  get --prefix --keys-only "$PREFIX" \
  > /tmp/etcd-keys.txt

wc -l /tmp/etcd-keys.txt    # how many keys
less /tmp/etcd-keys.txt     # scan them for obviously-large/duplicated names

Thresholds:

> ~100KB per object — worth investigating.
> ~1MB per object — real problem. Every LIST and every controller reconcile reads the whole thing.
Prefix total in tens of MB — same effect even when no single object is huge.

If the warnings point at a small, hot key instead

Not every offender is a big object. In the same incident, these small keys also showed up in slow reads:

/registry/serviceaccounts/crossplane-system/crossplane (~1.1KB)
/registry/secrets/crossplane-system/repo-secret-for-image-pull (~3.7KB)
/registry/secrets/crossplane-system/upbound-system-pull-secret (~3KB)

Their size: is tiny but they appeared in dozens of concurrent 800ms–1s reads. That's a different failure mode: many clients hammering the same key while etcd is already busy serving a large one. Confirm by counting how often the same key shows up in the warn log over a short window:

kubectl -n "$MXP_NS" logs vcluster-etcd-0 --since=15m \
  | grep 'apply request took too long' \
  | grep -oE 'key:"[^"]+"' | sort | uniq -c | sort -rn | head -20

Step 7 – Get etcd metrics again

In one terminal, port-forward to the etcd pod:

kubectl port-forward -n $MXP_NS <etcd-pod> 2381:2381

In another, output all metrics to a file to share with Upbound:

curl -s http://localhost:2381/metrics > /tmp/etcd-metrics-step-7.txt