Sanjai Murugan — AI Systems Engineer & Researcher

Engineering Projects

Systems I've Built

Each project reflects deep systems-level thinking — spanning agent orchestration, inference optimization, retrieval systems, and computer vision.

🔍 Research Project

Bike Rental RAG Assistant

"AI-powered retrieval assistant for intelligent bike rental support and real-time customer interaction."

A Retrieval-Augmented Generation (RAG) based AI assistant developed for a bike rental platform to handle real-time customer queries and provide accurate responses regarding pricing, rental policies, bike availability, booking procedures, and platform features. The system leverages semantic search and vector embeddings to retrieve relevant information from a knowledge base, enabling context-aware and reliable conversations.

Real-time query handling for bike rental platform users
Semantic document retrieval using vector databases and embeddings
Context-aware multi-turn conversational support
Efficient knowledge retrieval with optimized chunking techniques
Accurate response generation with retrieval-based context injection

LangChainChromaDBOpenAI EmbeddingsFastAPI

🐙 GitHub 💼 LinkedIn

🧪 Experiment

TinyOllama LoRA Fine-Tuning Research

"Parameter-efficient specialization experiments on lightweight local LLMs."

Explored LoRA-based fine-tuning workflows using MLX and lightweight language models to understand efficient specialization techniques and local deployment optimization. Documented rank exploration, quantization strategies, and quality trade-offs.

Low-parameter adaptation with LoRA rank exploration (r=4 to r=32)
Inference efficiency benchmarking before and after fine-tuning
Specialization behavior analysis across model sizes
Local deployment feasibility studies on consumer hardware

~0.1% of total parameters trained

4× reduced compute vs full FT

Optimized experimentation pipeline

🐙 GitHub 💼 LinkedIn

Key Findings

LoRA rank r=8 achieved 85% of full fine-tuning quality with 97% fewer trainable parameters.
MLX on Apple Silicon proved competitive with CUDA for small-scale LoRA experiments.
Quantization-aware fine-tuning (QLoRA) enabled effective training on 6GB VRAM hardware.

📷 Applied AI / Computer Vision

Low-Light Animal Detection Pipeline

"Computer vision system for wildlife detection in difficult lighting environments."

Developed a computer vision pipeline capable of detecting wildlife in low-visibility scenarios using image enhancement and object detection techniques. Explored preprocessing optimization for real-time performance under challenging environmental conditions.

Low-light image enhancement as preprocessing stage
Detection reliability scoring under varying conditions
Preprocessing optimization for real-time performance
Evaluation against real-world environmental conditions

🐙 GitHub 💼 LinkedIn

💡 Side Project

Semantic ATS Skill Matching Engine

"AI-powered skill extraction and semantic matching for hiring workflows."

A semantic ATS (Applicant Tracking System) software developed using Agentic AI workflows powered through n8n automation. The platform is designed to intelligently analyze resumes, understand candidate skills semantically, match applicants with job requirements, and automate hiring workflows in real time. By leveraging AI-driven decision making and contextual understanding, the system streamlines recruitment operations and improves candidate-job matching accuracy.

Semantic resume parsing and intelligent candidate-job matching
Agentic AI workflow orchestration using n8n automation
Automated recruitment pipeline and candidate screening
Context-aware skill evaluation using vector embeddings
AI-assisted hiring insights and workflow optimization

n8nAgentic AI

🐙 GitHub

Research & Experiments

Pushing Systems Beyond the Tutorial

Exploring experimental AI system architectures, inference optimization techniques, and efficient model specialization beyond standard application development.

Neuron Specialization Analysis

Activation Patterns Sparse Routing Transformer Internals

Studying activation clustering and sparse routing ideas within transformer layers. Investigating specialization patterns that emerge naturally during training and how they can be exploited for more efficient computation.

Efficient Inference Exploration

KV Cache Selective Attention Compute Reduction

Exploring KV cache optimization strategies, selective computation methods, and various compute reduction techniques to enable faster, more memory-efficient inference on consumer hardware.

Dynamic Expert Routing

MoE Adaptive Systems Hierarchical Execution

Designing modular activation systems with adaptive routing logic. Exploring hierarchical execution concepts where different subnetworks handle different complexity levels of tasks.

Agent Memory Architectures

Long-term Memory RAG Multi-Agent Sync

Investigating long-term memory mechanisms for autonomous agents. Researching retrieval-based context management and multi-agent synchronization protocols for persistent collaborative systems.

⬡ Recent Experiment Logs

2026-05-10

Sparse MoE routing on Llama-3-8B — 47% FLOPs reduction with <3% quality drop

2026-04-22

KV cache quantization experiments: INT8 maintains 99.1% retrieval accuracy

2026-03-18

LoRA rank ablation study: r=8 optimal for domain adaptation on limited data

Concept Research

AI-Powered Skill Intelligence Platform

📄

Resumes & Profiles

→

🧠

Embedding Engine

→

🔗

Vector Skill Graph

→

⚖️

Somatic-Weight Scoring

→

🎯

Intelligent Matching

The Problem

Traditional ATS systems rely on keyword matching — fundamentally broken for identifying real skill relationships. A backend engineer who knows "distributed systems" should surface for "microservices architecture" roles, but current systems miss these connections entirely.

The Approach

Building an AI-powered platform using vector database semantic matching, embedding-based resume analysis, and contextual skill relationship mapping inspired by somatic-weight principles. The system doesn't just match keywords — it understands the relationships between skills and maps candidate capabilities to role requirements at a semantic level.

🔬

Vector Skill Graph Multi-dimensional skill representation with relationship edges

🧬

Somatic-Weight Scoring Contextual skill importance derived from relationship patterns

⚡

Real-time ATS Optimization Intelligent candidate ranking with explainable scoring

🤖

Bidirectional Intelligence Recruiter insights + candidate experience optimization

Vector Databases Embedding Models Graph Neural Networks LangChain FastAPI PostgreSQL + pgvector

My Journey

From Curiosity to Systems Engineering

The evolution from exploring Python to engineering production-grade AI infrastructure — told as it unfolded.

2023

The Spark

Everything started with curiosity. I picked up Python not because I had a plan, but because everyone said it was the language to learn. I started with basics — variables, loops, functions — then quickly fell into the rabbit hole of backend systems and APIs.

The first "hello world" API I built with Flask felt like magic. I had no idea what I was doing, but it worked.

Early experimentation with AI tooling began here — simple scripts, basic ML models, and a growing obsession with understanding how things actually work under the hood.

PythonFlaskAPIsBasic ML

2024

Building Production Systems

This is when things shifted from hobby to serious engineering. I built my first production-ready RAG pipeline, deployed workflow automation with n8n, and integrated various AI APIs into real applications.

First experiments with LangChain, ChromaDB, and local LLM inference via Ollama. I remember the excitement of running a 7B model locally for the first time — it felt like unlocking a superpower.

Key realization: The gap between "AI demos" and "AI systems" is enormous. Most tutorials stop at the demo. I wanted to build what comes after.

LangChainChromaDBOllaman8nFastAPI

2025

Deep Dive into AI Systems

I went deep. LoRA fine-tuning — understanding rank selection, placement strategies, and when parameter-efficient methods break down. Multi-agent orchestration — building systems where multiple LLM agents coordinate on complex tasks.

Optimized local inference pipelines, experimented with MLX on Apple Silicon, and began researching efficient attention mechanisms. Started documenting everything — not just the wins, but the failures and dead ends that taught me the most.

Built a multi-agent coding system from scratch. It was buggy, slow, and frustrating. I learned more from that project than from any tutorial.

LoRA/QLoRAMLXAgent SystemsInference Opt.RAG

2024-25

The Pivot That Didn't Happen

I once tried to build a startup around an AI-powered hiring intelligence platform. The vision was compelling — semantic skill matching, contextual relationship scoring, and an ATS that actually understood candidates beyond keywords.

We prototyped the core engine — vector-based skill graphs, embedding-driven resume analysis, early experiments with graph neural networks for skill relationship mapping. The tech worked. The GTM didn't.

Sometimes the best engineering decisions are the ones you don't ship. I learned more about product-market fit, team dynamics, and when to walk away from this project than from any that succeeded.

It's shelved for now, not abandoned. The research lives on in this portfolio — as a startup concept I revisit when the timing is right.

Vector DBsEmbeddingsGraph MLATS Systemsn8n

2026

Research & Infrastructure at the Edge

Current focus: pushing AI systems to their limits. Sparse expert routing for 47% FLOPs reduction. KV cache quantization maintaining 99.1% accuracy at INT8. MoE architectures designed for edge deployment.

I'm building infrastructure that bridges the gap between research papers and production systems — taking experimental architectures and making them actually work at scale.

The goal isn't just to use AI. It's to understand it deeply enough to build what doesn't exist yet.

MoESparse RoutingKV CacheEdge AIInfra

Beyond

What's Next

I'm exploring the convergence of efficient inference, agent memory architectures, and AI-driven developer tools. The future isn't just bigger models — it's smarter systems that use intelligence efficiently.

Always open to research collaborations, ambitious projects, and conversations about where AI infrastructure is heading.

If you're working on something interesting in AI systems, I'd love to hear about it.

Open to Ideas