TL;DR
- Sidecars eat 20-30% of cluster resources. eBPF does the same job at under 1%.
- VictoriaMetrics uses 10x less RAM than InfluxDB and 7x less than Prometheus with Thanos.
- Foundation models like MOMENT and TimesFM can spot anomalies without any training data.
- But honestly? Statistical baselines still beat ML most of the time. Layer them.
- Cardinality creeps up quietly. By the time you notice, your storage costs are already out of control.
The Hidden Cost of Sidecars
Istio and Linkerd changed how we build microservice architectures. No argument there. What nobody warned us about was the resource bill of running a proxy next to every single pod.
What Is a Sidecar, Anyway?
In Kubernetes, a sidecar is an extra container that runs alongside your main application inside the same pod. Every request your app sends or receives passes through it first. Service meshes like Istio use Envoy sidecars to handle:
- Traffic management: Routing requests between services
- Security: Encrypting communication, verifying identities
- Observability: Collecting metrics about requests, latency, errors
Great idea—until you pay the bill.
The Math Gets Ugly Fast
Each Envoy sidecar idles at 100–200 MB of RAM. Under load, it burns 0.1–0.5 CPU cores and adds 1–5 ms of latency per hop.
- 500 services × 3 replicas = 1,500 pods → 1,500 sidecars
- 150–300 GB RAM doing proxy work, not business logic
- 150–750 CPU cores spent on plumbing
- A 5-hop request can pay ~25 ms just in sidecar tax
Teams running meshes at scale routinely report 20–30% of total cluster resources going to sidecars.
The Problems You Don’t See on the Dashboard
- Fate sharing: if the sidecar dies or hangs, your app is down even if your code is fine.
- Upgrades: Envoy bumps require restarting every pod—thousands of rolling restarts for a patch.
- Visibility blind spots: sidecars only see network traffic. System calls, file I/O, in-process behavior? Invisible.
- Cold starts: new pods wait for sidecars to initialize, fetch config, and connect—adding 5–15 seconds to scale-up.
eBPF: Moving Observability Into the Kernel
Sidecars bounce packets between kernel and user space. eBPF pushes tiny programs into the kernel so data is captured where events happen—no extra context switches.
What’s Actually Happening

- Kprobes: attach to kernel functions.
- Uprobes: attach to user-space functions.
- Tracepoints: predefined kernel event hooks.
- XDP/TC/socket filters: tap packets before or as the kernel processes them.
Programs write to eBPF maps (kernel key-value stores) and stream data up efficiently—no app changes.
The Performance Gap Is Massive
| Metric | Sidecars (per pod) | eBPF (per node) |
|---|---|---|
| CPU overhead | 5–15% | <1% |
| Memory | 100–200 MB | 50–100 MB |
| Latency hit | 1–5 ms per hop | <0.1 ms |
| Deployment model | One per pod | One per node |
| Visibility | L7 only | L3/L4/L7 + syscalls + file I/O |
Seeing Things Sidecars Can’t
- System calls: read(), write(), connect(), accept().
- File I/O: which files are read/written and how long it takes.
- Network: XDP packets, TCP states, retransmits, drops, protocols.
- CPU scheduling: when processes run and get preempted.
- Memory patterns: allocations, leaks, pressure.
The Tradeoffs (Being Honest)
- Needs modern kernels (realistically 5.x).
- TLS payload visibility requires uprobes into crypto libs—finicky and version-sensitive.
- Linux-only; Windows nodes need another approach.
- Writing eBPF safely is hard—stick to battle-tested tools (Cilium, Pixie, Falco, Tetragon).
- Kernel-level bugs can crash nodes—test in non-prod first.
Where Do You Store All This Data?
Telemetry volume and cardinality explode fast. You need low-latency operational queries and affordable long-term analytics.
The Options That Actually Work at Scale
VictoriaMetrics: Efficiency That Keeps Surprising Me
- 10x less RAM than InfluxDB; 7x less than Prometheus+Thanos/Cortex.
- High-cardinality queries up to 20x faster; compresses to 0.4–0.8 bytes/point.
- Prometheus-compatible; simple single binary or split (vmselect/vminsert/vmstorage).
Use it when you want Prometheus compatibility without the overhead, or cardinality is already biting.
ClickHouse: When You Need Real Analytics Power
- OLAP-first: JOINs, subqueries, window functions, mixed data (metrics + events + logs).
- 10–20x compression is normal; full SQL interface your team already knows.
Use it when PromQL can’t express the questions you’re asking or you need to correlate metrics with business data.
Using Both: The Tiered Approach

- Hot/warm: VictoriaMetrics for dashboards, alerts, recent debugging.
- Cold/analytic: ClickHouse for deep history and complex correlations.
Foundation Models for Time Series: Separating Hype from Reality
Large pre-trained time-series models promise zero-shot anomaly detection/forecasting on your metrics.
MOMENT (CMU, ICML 2024)
- Transformer (40M–385M params) trained on the “Time-Series Pile.”
- Zero-shot anomaly detection via masked reconstruction.
- Reality: KNN baselines often beat it on common infrastructure metrics.
TimesFM (Google, ICML 2024)
- Decoder-only transformer (200M params) trained on 100B+ time points.
- Zero-shot forecasting; open on HuggingFace (google/timesfm-2.5-200m-pytorch).
- Needs GPU for speed; semantics of metrics still matter; must guard against nonsense outputs.
What I’ve Actually Learned Using These
- Great for complex seasonality, brand-new metrics, heterogeneous data, and when labeling is impossible.
- Old-school stats still win for stable metrics, low latency, and resource-constrained environments.
- Use foundation models as an enhancement layer—keep statistical baselines as the workhorse.
A Three-Layer Detection Architecture That Actually Works
Layer 1: Static Thresholds (The Safety Net)
rules:
- metric: container_cpu_usage_percent
operator: gt
threshold: 95
duration: 5m
severity: critical
Fast, reliable, and noisy by design—your last line of defense.
Layer 2: Statistical Baselines (The Workhorse)
# Z-score against rolling baseline
baseline_mean = metrics.rolling(window='7d').mean()
baseline_stddev = metrics.rolling(window='7d').std()
z_score = (current_value - baseline_mean) / baseline_stddev
if abs(z_score) > 3:
alert("Anomaly detected", confidence=0.997)
Adapts per service, handles weekday/weekend cycles, explains alerts in one sentence.
Layer 3: ML/Foundation Models (The Enhancement)
# Bring in ML for cases where statistics are uncertain
if statistical_baseline.uncertain:
tsfm_prediction = timesfm.predict(historical_window)
reconstruction_error = moment.detect(current_window)
if reconstruction_error > threshold:
alert("Complex anomaly detected",
confidence=reconstruction_error,
explanation=tsfm_prediction)
ML augments but never gates. If ML is slow or down, layers 1 and 2 keep firing.
Cardinality: The Problem That Bankrupts You Quietly
Every unique metric-label combination is a series. One innocent label can explode into millions of series, wrecking performance and cost.
How One Innocent Metric Becomes Millions
# Looks harmless
http_requests_total{
method="GET", # 5 values
path="/api/v1/users", # 1,000 if IDs are embedded
status="200", # 50 values
pod="web-abc123", # 500 pods
customer_id="12345" # 100,000 customers — the killer
}
Potential: 12.5 trillion series. You won’t hit that, but millions are enough to melt dashboards and storage.
A Governance Framework That Actually Works
- Set limits by criticality (e.g., Critical: 100k; Standard: 25k; Batch: 10k; Dev: 5k).
- Enforce at multiple layers:
- Agent/collector: strip high-card labels, aggregate/sample.
- Storage: hard per-tenant limits, expire inactive series, rate-limit ingestion.
- Policy (Kubernetes): CRD for allowed labels; admission webhook blocks bad configs.
Finding Problems Before Your Bill Does
Prometheus/VictoriaMetrics:
topk(10, count by (__name__)({__name__=~".+"}))
Alert early:
- alert: HighCardinalityMetric
expr: count by (__name__)({__name__=~".+"}) > 10000
for: 5m
annotations:
summary: "Metric {{ $labels.__name__ }} has over 10k series"
description: "Investigate immediately. Check for high-cardinality labels."
OpenTelemetry: The Standard That Actually Won
Why OpenTelemetry Matters
- Vendor-neutral; collect once, send anywhere.
- Complete: metrics, logs, traces with shared concepts.
- Massive momentum and universal vendor support.
The Collector: Your Telemetry Hub
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
prometheus:
config:
scrape_configs:
- job_name: "kubernetes-pods"
processors:
batch:
timeout: 5s
send_batch_size: 1000
servicedna/enrich:
profile_endpoint: "http://dna-brain:8080"
enrich_attributes:
- servicedna.risk_score
- servicedna.anomaly.detected
- servicedna.baseline.deviation
exporters:
prometheusremotewrite:
endpoint: "http://victoriametrics:8428/api/v1/write"
clickhouse:
endpoint: "tcp://clickhouse:9000"
database: observability
service:
pipelines:
metrics:
receivers: [otlp, prometheus]
processors: [batch, servicedna/enrich]
exporters: [prometheusremotewrite, clickhouse]
OTel Collector becomes the hub: receive, enrich, and route to hot and cold storage without duplicating instrumentation.
What To Take Away From This
- eBPF slashes overhead and sees what sidecars can’t.
- Storage choice compounds—VictoriaMetrics for efficiency, ClickHouse for deep analytics, both together for the win.
- Foundation models are useful, not magical—layer them on top of statistical baselines.
- Cardinality governance is non-negotiable—set limits and enforce them before the bill bites.
- OpenTelemetry has won—build on it and stay portable.
Related Posts
Beyond Kubernetes: Why Modern Infrastructure Needs an Intelligence Layer
Kubernetes excels at orchestration—but today's systems demand understanding, prediction, and intelligent decision-making.
The Physics of Scale: Mathematical Foundations for Intelligent Infrastructure
Before building intelligent systems, we need to understand the physics that governs them. Two laws change how you think about scalability forever.