Innerve Data Science Hackathon

Hi, We are ThandePapas





Aditya

(NSUT)

Harshit

(NSUT)





Project:

Raahi : Navigation for the visually impaired


Domain:

Health

Click here for Git link (first)
Click here for Git link (second)
Click here for Dataset (Custom recorded and labelled by us)
Next

Overview

Problem Statement:


For a visually impaired person, how can we build something to help navigate the road better?

Our solution:

Our task consists of two problems :
1. Direction/ Lane Predictions
2. Object detections for incoming obstructions/vehicles/people
We try here to tackle both of them using our data science models.

Dataset screenshots

FIRST TASK
Lane Detection

Technologies used :

OpenCV

library for python

Click here for Github Link

What we did?

Our objective was to get the lane that is valid to walk upon.
We figured to apply hough transform to the dataset and get the vertical lines after the picture is passed through canny filter and segmentation has been done to get the area that is best to look upon. Lastly we visualise the predictions overlayed on the original frames.

Final Results :
Doing Noise reductions and Applying canny with OpenCV :
Applying segmentation :
Applying Hough Transform:

Merits of this model





The model does find the nearest lane in the video, as seen in the videos.
It also does predict any big obstructions ahead of your POV such as : a speed bump, a landfill, and much more.






Is it enough ?





This model detects the lanes quite nicely if it's in a straight linewith a pretty good accuracy.
Though we can still improve on the model.
There are a few demerits of the system:
1. Curved roads are not detected.
2. It sometimes gets confused by gravel roads, which can be improved to an extent by increasing the blur.

What next?

Since it's inherently an object detection task we can indeed just use a CNN (Convulutional Neural Network)

Lane Detection

by CNN approach







The main problem with this is what should be the output layer of the CNN.
We decided to go with 12 points having x and y coordinates.
6 points on the left and 6 points on the right.

The CNN will learn on the labelled dataset and then according to
the learned weights predict on new unseen data. Hence the output layer is a 24 unit flat layer.
The process of labelling is done by a custom openCV program which can be found in the project directory.

Model Summary



We used a fairly standard CNN model with increasing filters and decreasing size of convolutions.
We train it for 40 epochs to avoid overfitting.

Output of the CNN

The CNN learns the lanes and the directions we should take.

Merits and demerits

The model does pretty nice provided the dataset is pretty limited and we take in account the labelling biases.
Overall it does detect the direction and where it should not go.
The loss too goes down with increasing epochs so the model is able to get the features.

The predictions are not state of art.
The dataset we had was pretty limited with less people and vehicles on roads, making it easier for the model to get good accuracy
The model could have overfit due to this.

Getting better predictions are overkill at this point without using very deep neural nets.
But the results are also not to bad to train such a model.
Alternatively we could get the weights of an already deep neural model and use transfer learning.

Second Task : Object Detection

We implement the YOLO algorithm which gives a great accuracy and being easiest to implement.

YOLO algorithm basically uses a DARKNET-53 model which is trained on ImageNet data.
Since retraining on our labels will take both time and computations, we took weights from the
open source community and built the model to get the predictions.
We did implement non-max suppression to get better bounding boxes because YOLO
gives various anchor boxes. We give a probability threshhold to get best results
and moreover exclude overlapping data.
We further convert the output , which is given as the mid point of the anchor box and the
height and width of the image to x, y coordinates and then plotting it using openCV overlaying on the testset.

Github link

Outputs

Analysis





The model gives state of the art results correctly predicting the objects in front including people, vehicles and dogs.
This could partly be due to the pretrained weights which were trained on extensive data for extended epochs.
What we essentialy built was a deep enough neural network to harness the accuracy.
Overall in the project, we built two things necessary for path prediction :
Detecting valid paths and detecting any incoming obstructions.


Where to go from here?





The next steps were to deploy it via a device small enough to take the information,
and computationally cabable for predicting.
With the coming of tensorflow for native mobile applications, these are possible.
We could alternatively build a device just for this purpose.

We could essentialy build an app that takes all this data and :
1. Detects paths
2. Takes care of incoming car/person
3. Gives audio output (Text-to-speech) of where to turn, how much to turn, how much to walk.

We could essentially take even more dataset from the native device and retrain
it for even more epochs for even better accuracy.
This could prove to be a great idea for the visually impaired
and would be a help to them in the ever increasing populations.

We accept the accuracy is not state of art due to limited dataset
and the less time we had for training the CNN. Given the time, we could tune more hyperparameters
and see what works best for each model.
This can be developed further as a tool which has the potential to help the visually impaired.