I'm Sanjai β an AI systems engineer and researcher building local-first AI architectures, multi-agent orchestration, RAG pipelines,AI Automation and efficient inference systems.
I'm an AI systems engineer with a passion for building efficient, scalable AI systems that work locally and privately. My expertise spans from transformer internals and attention mechanisms to deployment pipelines and inference optimization.
I specialize in local-first AI architectures, multi-agent orchestration,Agentic AI, and RAG systems. I don't just use AI toolsβI build the infrastructure that makes them work efficiently on consumer hardware.
Currently exploring sparse expert routing, KV cache optimization, and edge deployment strategies for large language models.
Each project reflects deep systems-level thinking β spanning agent orchestration, inference optimization, retrieval systems, and computer vision.
"AI-powered retrieval assistant for intelligent bike rental support and real-time customer interaction."
A Retrieval-Augmented Generation (RAG) based AI assistant developed for a bike rental platform to handle real-time customer queries and provide accurate responses regarding pricing, rental policies, bike availability, booking procedures, and platform features. The system leverages semantic search and vector embeddings to retrieve relevant information from a knowledge base, enabling context-aware and reliable conversations.
"Parameter-efficient specialization experiments on lightweight local LLMs."
Explored LoRA-based fine-tuning workflows using MLX and lightweight language models to understand efficient specialization techniques and local deployment optimization. Documented rank exploration, quantization strategies, and quality trade-offs.
"Computer vision system for wildlife detection in difficult lighting environments."
Developed a computer vision pipeline capable of detecting wildlife in low-visibility scenarios using image enhancement and object detection techniques. Explored preprocessing optimization for real-time performance under challenging environmental conditions.
"AI-powered skill extraction and semantic matching for hiring workflows."
A semantic ATS (Applicant Tracking System) software developed using Agentic AI workflows powered through n8n automation. The platform is designed to intelligently analyze resumes, understand candidate skills semantically, match applicants with job requirements, and automate hiring workflows in real time. By leveraging AI-driven decision making and contextual understanding, the system streamlines recruitment operations and improves candidate-job matching accuracy.
Exploring experimental AI system architectures, inference optimization techniques, and efficient model specialization beyond standard application development.
Studying activation clustering and sparse routing ideas within transformer layers. Investigating specialization patterns that emerge naturally during training and how they can be exploited for more efficient computation.
Exploring KV cache optimization strategies, selective computation methods, and various compute reduction techniques to enable faster, more memory-efficient inference on consumer hardware.
Designing modular activation systems with adaptive routing logic. Exploring hierarchical execution concepts where different subnetworks handle different complexity levels of tasks.
Investigating long-term memory mechanisms for autonomous agents. Researching retrieval-based context management and multi-agent synchronization protocols for persistent collaborative systems.
Traditional ATS systems rely on keyword matching β fundamentally broken for identifying real skill relationships. A backend engineer who knows "distributed systems" should surface for "microservices architecture" roles, but current systems miss these connections entirely.
Building an AI-powered platform using vector database semantic matching, embedding-based resume analysis, and contextual skill relationship mapping inspired by somatic-weight principles. The system doesn't just match keywords β it understands the relationships between skills and maps candidate capabilities to role requirements at a semantic level.
The evolution from exploring Python to engineering production-grade AI infrastructure β told as it unfolded.
Everything started with curiosity. I picked up Python not because I had a plan, but because everyone said it was the language to learn. I started with basics β variables, loops, functions β then quickly fell into the rabbit hole of backend systems and APIs.
The first "hello world" API I built with Flask felt like magic. I had no idea what I was doing, but it worked.
Early experimentation with AI tooling began here β simple scripts, basic ML models, and a growing obsession with understanding how things actually work under the hood.
This is when things shifted from hobby to serious engineering. I built my first production-ready RAG pipeline, deployed workflow automation with n8n, and integrated various AI APIs into real applications.
First experiments with LangChain, ChromaDB, and local LLM inference via Ollama. I remember the excitement of running a 7B model locally for the first time β it felt like unlocking a superpower.
Key realization: The gap between "AI demos" and "AI systems" is enormous. Most tutorials stop at the demo. I wanted to build what comes after.
I went deep. LoRA fine-tuning β understanding rank selection, placement strategies, and when parameter-efficient methods break down. Multi-agent orchestration β building systems where multiple LLM agents coordinate on complex tasks.
Optimized local inference pipelines, experimented with MLX on Apple Silicon, and began researching efficient attention mechanisms. Started documenting everything β not just the wins, but the failures and dead ends that taught me the most.
Built a multi-agent coding system from scratch. It was buggy, slow, and frustrating. I learned more from that project than from any tutorial.
I once tried to build a startup around an AI-powered hiring intelligence platform. The vision was compelling β semantic skill matching, contextual relationship scoring, and an ATS that actually understood candidates beyond keywords.
We prototyped the core engine β vector-based skill graphs, embedding-driven resume analysis, early experiments with graph neural networks for skill relationship mapping. The tech worked. The GTM didn't.
Sometimes the best engineering decisions are the ones you don't ship. I learned more about product-market fit, team dynamics, and when to walk away from this project than from any that succeeded.
It's shelved for now, not abandoned. The research lives on in this portfolio β as a startup concept I revisit when the timing is right.
Current focus: pushing AI systems to their limits. Sparse expert routing for 47% FLOPs reduction. KV cache quantization maintaining 99.1% accuracy at INT8. MoE architectures designed for edge deployment.
I'm building infrastructure that bridges the gap between research papers and production systems β taking experimental architectures and making them actually work at scale.
The goal isn't just to use AI. It's to understand it deeply enough to build what doesn't exist yet.
I'm exploring the convergence of efficient inference, agent memory architectures, and AI-driven developer tools. The future isn't just bigger models β it's smarter systems that use intelligence efficiently.
Always open to research collaborations, ambitious projects, and conversations about where AI infrastructure is heading.
If you're working on something interesting in AI systems, I'd love to hear about it.
A curated stack focused on building, optimizing, and deploying AI systems from the ground up.
Verified certifications across AI, ML, and cloud platforms.
I'm always interested in connecting with people who share my passion for building intelligent systems β whether it's AI engineering internships, ambitious AI projects, research collaborations, or experimental infrastructure work.