Driver Drowsiness Detection

Driver's face, eyes localization with yawning and closed eyes detection

Problem Identification

Countless accidents occur due to Drowsy Drivers. According to Insurance Information Institute drowsy driving accounts for about 100,000 crashes annually on the roadway, 71,000 injuries, and 1,550 fatalities per year (“Facts + Statistics: Drowsy driving | III”).

Precedent Solutions

The first possible solution is Driver’s activity tracking, e.g. alert the driver if the driver is not holding the steering wheel or did not use pedals for some time. This approach is already used by car manufacturers, for example, Tesla cars alert the driver to apply slight pressure on the steering wheel while driving autopilot, to make sure that driver still controls the vehicle.

The second possible solution is driver’s drowsiness detection using the live video camera and alerting the driver while fatigued or sleeping.

Proposed Solution

Driver drowsiness detection is a car safety technology that helps prevent accidents caused by the driver getting drowsy. It alerts the driver if the driver is drowsy and falling asleep. The system detects closed eyes and yawning which can be drowsiness or sleeping signs. Driver drowsiness detection is a better solution because the main and first sign of a person’s fatigue is yawning and if the person falls asleep eyes are closed. If the person falls asleep, the driver's inactivity might be detected too. So by using a video camera the driver might be alerted through sound on time.

Dataset

The original dataset (“driver drowsiness using keras”), which was taken from Kaggle, contains 1448 images of drivers equally split between yawning and not yawning and 1448 images of closed and open eyes. The original data set is not annotated.
But during model training and evaluation it was noticed that images containing eyes only couldn’t help the model to detect the eyes on the images with the whole face. So as a result images including the whole driver’s face were annotated with eyes annotations. 348 images were used for training overall.

Detectron2 with initial dataset

First, we tried Detectron2 state of the art model as transfer skill learning based on Faster RNN with the model name as “faster_rcnn_X_101_32x8d_FPN_3x” with Imagenet as a pretrained model backbone and the head was FC, we just edited the number of classes in the head according to our dataset. While inferencing we observed that our model was not able to predict eyes on the images so we re annotated the dataset and did some augmentation also.

Reannotated and Augmented dataset

After facing the issue while inferencing, we came to the conclusion to reannotate our dataset through CVAT, then we splitted our dataset in Roboflow resulting in 546 images for training,36 images for validation and 22 images for testing.Further we did augmentation in the roboflow to have images based on “Brightness” including the feature of daylight and no daylight while the person is driving the car, “Blur” for the blurred images taken during driving due to not clean camera or bad road, and last “Noise” if the person is smoking or dust in the car resulting 3 outputs per training example.

Detectron2 with final dataset

Same model of detectron2 was used as before, While training there were 462 images for closed eyes,627 for open eyes,198 for no yawning and 348 for yawning resulting in the total of 1635 images while training for 1500 steps. For evaluation, the validation data has 40 images for eyes closed, 32 for open eyes, 8 for no yawning,and 28 for yawning. The evaluation result is as follow:

As seen for different threshold values for example AP50 (Average Precision for threshold value of 50) and so on average precision is observed. AP is maximum for a threshold of 50 percent and AP for different categories of classes shown in the above figure, while it requires more training but due to less computation power we were not able to retrain the model and hence we went ahead with inferencing the model.

As expected the model didn’t perform well for threshold of 0.70 as it wasn’t able to detect some classes at all which was not a good sign for our model as we want the recall to be high for not missing any detection, we tried to reduce the threshold to 0.45 and observe that now the classes are detecting but some images has reduced precision. And because of this tradeoff between recall and precision, we went to the YOLOv5 model.

YOLOv5

The YOLOV5 model was obtained using the following link(https://github.com/ultralytics/yolov5). The YOLOV5 is a pre-trained model and transfer learning was used to change the weights for our use case. We were able to obtain the best model using the second dataset which was mention above under Dataset, where the images of Yawing and No-Yawning also include annotations of eyes closed or eyes open. During the testing of the YOLOV5 model, it was seen to be the most accurate model out of all other object detection techniques used in this project. To yield better results from YOLOV5, it is recommend to use a larger dataset.

Development and deployment

Demo solution Github repository

https://github.com/daniyarka/DT

The main goal of our demo application is to showcase:

Driver is warned after yawning, so the driver can pull up to rest or be more cautious

overall
Driver is alerted when the driver is falling asleep, to be specific alert comes up after 1

second after the driver closed eyes

The demo for the project is web application that accepts uploaded video file, then the file is being processed and output file is returned to the user after. Output video demonstrates detection of yawning and open/closed eyes, the warning and the alert.

Flask was used as a backend framework that also was used for the frontend utilizing Flask templates and Jinja2. Docker was used to wrap the flask application utilizing the docker-compose for the further deployment. After creating compute instance on Google Cloup Platform the bare git repository was created on the server and the app was pushed directly to the repository. Then docker container was built and run on the server that allowed to access the web app using ip address and port. But in order to not use the port for app access, NGINX was used as reverse proxy server to redirect default 90 port to port 5000 which is the application itself.

Under the hood application uses the best model, which is YOLOv5 and inference running via PyTorch.

Demo

Conclusion

Learnt various objection detection techniques and models.
All models were performed on two datasets for better recall and precision results YOLOv5 performed better than FasterRCNN