Computer Vision Engineer

Building
Intelligent
Vision Systems

Started in a difficult place with a laptop and one question — how do machines actually process images and video? That question changed everything.

Self-Taught · Jun 2023Deep Learning Computer Vision7+ Hackathons · 2025-26 10+ Projects Published
GitHub LinkedIn Devpost
Scroll to read the story

The Real Story

Foundation First.
Production After.

I didn't rush to publish half-finished projects. 2023-2024 was pure learning — mastering fundamentals through practice. Then in 2025-2026, I built production-grade projects from scratch with proper architecture and deployment pipelines. This is the professional approach.

The Beginning · Jun 2023

A Hard Place. A Decision.

College wasn't financially possible. I was in a difficult situation after 12th — in a mess, figuring out what to do next. Self-study wasn't a statement. It was simply what I could do.

Then in the middle of all that, something completely hooked me — the idea that a computer could look at a video frame and actually understand what it's seeing.

"How does a machine actually process images and video? How does it see?"

That question wouldn't leave me. I opened YouTube, found courses, and started digging.

"I wasn't escaping anything. I was genuinely fascinated. The idea that pixels could become understanding — I needed to know how that worked."

"Before touching any model I wanted to understand what was actually happening underneath — so I started with math. Linear algebra, probability, calculus. I wanted to understand gradient descent, not just call a function that does it."

Foundations · 2023

Math First. Then ML.

Before any model: linear algebra, probability, calculus. Why does backprop work? What is gradient descent actually doing? Real understanding, not just working code.

Then classical ML: linear regression, logistic regression, SVM. Each concept from math to implementation. Built practice projects to solidify understanding — not for publishing, for learning.

Linear AlgebraProbability CalculusLinear Reg. Logistic Reg.SVM

Deep Learning Foundations · 2023-2024

Neural Networks & CNNs

Classical ML was great — but the original pull was always visual. I moved into deep learning: neural networks, backprop by hand, activation functions. Then CNNs. Then transfer learning.

Practiced with cats vs dogs (77.6% — thrilled at the time). Experimented with industrial defects. But these were learning exercises, not portfolio pieces.

PyTorchCNN ResNetTransfer Learning

"I could've rushed to publish those early experiments. But I knew the difference between 'it works on my laptop' and 'it's production-ready.' I chose to keep learning."

"Classification tells you what. Detection tells you where — in real-time. That shift felt completely different. Like I was building something that could actually watch."

Object Detection · 2024

YOLO & Real-Time Systems

Learned YOLO deeply — anchor boxes, NMS, IOU, custom dataset training. Practiced with activity detection, PPE compliance, fall detection concepts. Understood real-time constraints and edge deployment challenges.

These concepts would later become the foundation for production projects.

YOLO ArchitectureCustom Training Real-Time Concepts

Two-Stage Detection · 2024

Faster R-CNN & Architecture Evolution

After YOLO, I went backwards intentionally. RCNN → Fast RCNN → Faster RCNN. Understanding the evolution — why each architecture exists, what it fixed — made everything click deeper.

Region proposal networks, ROI pooling, two-stage vs one-stage tradeoffs. Practiced on mask detection concepts.

Faster R-CNNResNet-50-FPN COCO PretrainPascal VOC

"I didn't want to just use architectures. I wanted to understand why each one exists — what it fixed that the previous one couldn't. That thinking changed how I approach everything."

"MediaPipe gave me 33 3D points describing a person's whole body in real-time. I kept thinking — this is 99 numbers per frame per second. What can I build with 99 numbers?"

Pose & Face · 2024-2025

MediaPipe & Face Recognition

Discovered MediaPipe and face_recognition — understanding the body through 3D landmarks, tracking identity, real-time pose estimation. Learned how to work with sequential landmark data and spatial relationships.

Practiced yoga pose classification concepts and face filter logic — building understanding for future production systems.

MediaPipeFace Recognition 33 LandmarksSpatial Logic

Tracking & OCR · 2025

DeepSORT, CRNN, ONNX & Multi-Model Pipelines

Objects that disappear behind things and reappear — DeepSORT for maintaining unique IDs across frames and occlusions. Then OCR: LSTM, Transformers, CTC loss.

Learned to design and train a custom CRNN from scratch, then export models to ONNX for framework-agnostic inference and deployment-ready pipelines. Built an end-to-end system: detect → track → verify → read → collect evidence.

DeepSORTCRNN Architecture Bi-LSTMCTC LossONNX

Understood how multiple models communicate in production: YOLO for detection, DeepSORT for identity consistency, CRNN for OCR, and ONNX Runtime to decouple training from inference.

"Hackathons forced me to learn deployment the hard way. No tutorials, no time — just 'the demo is in 6 hours and it needs to work.' That's where I learned Docker, GCP, and RAG."

Deployment Under Pressure · Sep 2025 – Feb 2026

7+ Hackathons. Real Deadlines.

From September 2025 to now: 7+ hackathons across healthcare, safety, conservation, education, social impact. No wins yet — but each one forced me to build and deploy in 24-48 hours.

Learned RAG, GCP, Docker, Datadog, ElevenLabs — not from tutorials, because the demo was due and I had no choice. This is where theory became deployment experience.

RAGGemini 2.5 Flash GCP DockerElevenLabsDatadog
MedBridge — AI Emergency Healthcare Platform
GitHub Built Dec 2025

Production Portfolio · 2025-2026

Building for Real. Publishing with Standards.

Now I had the foundation (2023-2024) and deployment experience (hackathons). Time to build production-grade projects from scratch with proper architecture, documentation, and deployment pipelines.

Each project built end-to-end: problem → architecture → implementation → ONNX export → documentation → GitHub publication.

Production-ReadyONNX Deployment Proper DocsGitHub Published
Industrial Defect Detection
GitHub Built Feb 2026
Safe Fall Detection — Elderly Care System
GitHub Built Oct 2025
Face Mask Detection — Faster R-CNN
GitHub Built June 2025
Helmet Detection + Number Plate OCR (ONNX)
GitHub Built Jan 2026
Custom Face Blur & Overlay Filter
GitHub Built Oct 2025
AI Yoga Pose Classifier — 95.4% Accuracy
GitHub Built May 2025

"The difference between my 2024 practice projects and my 2026 published work? Architecture. Documentation. ONNX deployment. Proper Git history. That's what makes it production-ready."

"Detection draws a box around something. Segmentation traces every single pixel that belongs to it. It's a different kind of precision — and in medical imaging, that precision is the difference between useful and useless."

Semantic Segmentation · Feb 2026

Pixel-Level Understanding — U-Net

After hackathons pushed me into deployment and production projects solidified my pipeline skills, the next frontier was clear: not just detecting objects — understanding them at the pixel level.

Studied encoder-decoder architectures, skip connections, and DiceBCE loss. The jump from bounding boxes to masks is a completely different class of problem — and I built a production-grade medical imaging system to prove it.

U-NetResNet34 Encoder DiceBCE LossAlbumentations Medical Imaging
Lung Segmentation — U-Net ResNet34 · 96.34% Dice
GitHub Built Feb 2026

"The original question was 'how do machines see?' Three years in, I realise it doesn't have a final answer. Every architecture is just a better, deeper answer. That's what keeps me going."

Technical Arsenal

Skills Built

Deep Learning

  • PyTorch
  • YOLO
  • Faster R-CNN
  • ResNet / Transfer Learning
  • CRNN + CTC Loss
  • Siamese Networks
  • GANs / SGANs

Computer Vision

  • Object Detection
  • Multi-Object Tracking (DeepSORT)
  • Pose Estimation (MediaPipe)
  • Face Recognition
  • OCR & Number Plate
  • People Counting
  • Segmentation

Deployment & Production

  • ONNX Export & Optimization
  • Docker Containerization
  • Google Cloud Platform
  • Django REST APIs
  • Datadog LLM Observability
  • PostgreSQL / MySQL
  • Git & GitHub

AI & LLMs

  • Gemini 2.5 Flash (Vertex AI)
  • RAG Systems
  • ElevenLabs Voice AI
  • Blockchain Integration
  • Prompt Engineering

Competition Record

Pressure Is
Where I Build Best

7+

Hackathons participated Sep 2025 - Feb 2026

0

Wins yet — but learned deployment under pressure

24-48h

Build, deploy, demo — every single time

10+

Production CV projects published to GitHub

Full Credentials

Resume

Let's Connect

Open to
Opportunities

I build production systems — from training scripts to deployed products with ONNX optimization. If you're working on something where visual intelligence matters, I'd love to talk.