
Driver Drowsiness Detection
YOLOv5-based detection of yawning and closed eyes with a deployed video-processing web demo.
Links
Quick facts
• Role: ML Engineer (Computer Vision) + Backend/Deployment (demo web app)
• Timeframe: Not specified
• Platform: Web demo (video upload → processed output) + CV model inference
• Status: Completed (academic project + demo)
• Team: Team project
Summary
Built a driver drowsiness detection demo that localizes eyes and mouth cues on video, then triggers a warning after yawning and an alert after ~1 second of closed eyes. We iterated on dataset quality (re-annotation + augmentation), compared Faster R-CNN (Detectron2) vs YOLOv5, and shipped the best-performing pipeline behind a simple web UI.
Key highlights:
• Re-annotated face images to make eye detection work in full-face frames
• Chose YOLOv5 after recall/precision tradeoffs with Faster R-CNN
• Deployed a Dockerized Flask app behind NGINX on GCP
Problem
• Original Kaggle data was not annotated for detection and didn’t transfer well from “eyes-only” images to full-face frames.
• The system needed high recall to avoid missing drowsiness events, while keeping false alerts manageable.
• The demo had to process full videos, overlay detections, and return a clear output artifact.
Solution
We rebuilt the dataset around full-face frames and annotated eyes + yawning states, then used augmentation to simulate real driving conditions (brightness, blur, noise). After benchmarking Detectron2 Faster R-CNN and tuning thresholds, we moved to YOLOv5 for better practical accuracy. The final demo is a Flask web app that accepts a video upload, runs PyTorch inference frame-by-frame, overlays detections, and returns a processed video with warning/alert events.
• Implemented event logic: “yawn warning” and “eyes closed > 1s” alert
Architecture
• Data: Kaggle base dataset → CVAT re-annotation (full-face) → Roboflow split + augmentation
• Model training: transfer learning with Detectron2 (Faster R-CNN) and YOLOv5
• Inference: PyTorch YOLOv5 on video frames → bounding boxes + class labels + confidence
• Event engine: timers/thresholds to trigger yawning warnings and closed-eye alerts
• Demo app: Flask (templates + Jinja2) for upload/result flow
• Deployment: Docker Compose on GCP VM + NGINX reverse proxy
Hard problems solved
• Fixed a core dataset mismatch: “eyes-only” training data failed on full-face inference, so we re-annotated for the real input distribution
• Designed augmentations to reflect driving noise (lighting shifts, blur, in-cabin noise) without breaking labels
• Tuned for safety-oriented recall: explored confidence thresholds and accepted precision tradeoffs to reduce missed detections
• Switched model families when Faster R-CNN thresholding still missed classes at higher confidence cutoffs
• Implemented temporal logic (closed-eyes duration) instead of relying on single-frame predictions
• Built a robust video pipeline: decode → infer → annotate → re-encode → return output reliably in a web workflow
Impact / Results
• Delivered a working demo that flags yawns and sustained eye-closure events on uploaded driving videos
• Produced a repeatable dataset + training pipeline with clear learnings on model selection and threshold tradeoffs
• Deployed an end-to-end system (model + web app + infra) suitable for showcasing to non-technical stakeholders
Tech stack
• Architecture: Object detection + temporal event rules (yawn / eyes-closed)
• Backend/Infra: Flask, Docker Compose, NGINX, Google Cloud VM
• Tooling: PyTorch, YOLOv5, Detectron2, Roboflow, CVAT
