Kafka · Redis · Python · Postgres · GenAI · Distributed Systems · Observability
I design backend systems that stay reliable at scale, adapt fast to product needs, and fail predictably.
8+ years across infra-heavy teams building telemetry pipelines, orchestrators, and LLM-backed systems under concurrency, latency, and audit constraints.
- Distributed Cloud Applications → Microservices with predictable scale & recoverability
- Stream Processing Pipelines → Kafka + Postgres + Redis under 10M+ event loads
- Telemetry + Observability Systems → Tracing, metrics, SLA diagnostics (Prometheus, OTel)
- LLM Agent Infrastructure → Memory-backed, tool-using multi-agent execution engines
- Control Plane & Coordination → Consensus-safe orchestration, retries, failover resilience
- Distributed Systems: queues, state machines, eventual consistency
- Infra Design: ingestion, orchestration, API contracts, failure budgets
- Stream Processing: Kafka, Redis, Celery, Prefect
- Observability: OpenTelemetry, Prometheus, Grafana, Sentry
- GenAI Integration: agent memory, structured planning, tool use
- Cloud & Ops: Docker, Kubernetes, AWS (ECS, CloudWatch), Terraform (basic)
- Built streaming ingestion pipelines handling 10M+ events/month
- Cut P95 latency by 45% and ETL time by 30% in clinical telemetry
- Reduced cross-region failures by 35% through retry-safe orchestration
- Logged full agent memory + tool usage telemetry for enterprise GenAI workflows
- Redis-based observability platform acquired by Redis Inc (folded into RedisInsight)
- 🧠 memoria: Long-term memory infra for agents — Redis + Neo4j + vector search with temporal + semantic context
- 🧪 infrasim: Chaos simulation + fault injection platform with trace replay, SLO dashboards, and distributed failure visualizer
- 🛰️ synapse: Modular agent framework with controller-worker pattern, task routing, tool policies, and memory-integrated planning
- 🧾 cognify: Rule+LLM hybrid engine with YAML DSL, audit trails, retry-safe pipelines, and deterministic + generative reasoning fusion
- 🧩 spectra: Observability-as-code for microservices and agents — OpenTelemetry auto-instrumentation with latency maps and SLA views
- 📦 Designing Chrome DevTools-style UI for real-time Kafka + Redis pipeline debugging
- 🔁 Building a trace-aware feedback loop for agent retries and subgoal recovery
- 📊 Benchmarking multi-agent planning across QA, RAG, and vision-grounded reasoning
- 🧬 Early research on “Project Episteme”: decentralized agents discovering novel scientific hypotheses
- 🔍 Prototyping GPT+tool+memory chain visualizer for auditing AI reasoning in real time
- 🔗 GitHub
- 💬 Twitter / X
- 🧠 Stack Overflow
Currently exploring Senior/Staff roles in distributed systems, observability, or cloud-native infra teams (e.g. telemetry, ingestion, real-time processing).
DMs open — let’s build resilient systems.