Updated: Jun 6, 2021
Humans don’t start their thinking from scratch every second. As you read this essay, you understand each word based on your understanding of previous words. You don’t throw everything away and start thinking from scratch again. Your thoughts have persistence.
Traditional neural networks can’t do this, and it seems like a major shortcoming. For example, imagine you want to classify what kind of event is happening at every point in a movie. It’s unclear how a traditional neural network could use its reasoning about previous events in the film to inform later ones.
Recurrent neural networks address this issue. They are networks with loops in them, allowing information to persist.
Now the question arises that why we need Recurrent Neural Network this is because when we use Bag of Words (BOW) , TF-IDF and Word2Vec in all this technique Sequence information is lost .
i.e IF i have a sentence " My name is Roy and my age is 12 "
by using above techniques we will only get some of words like " Roy age 12 " , this is know as lost of sequential information.
If i want to control sequential information i will be using Recurrent Neural Network (RNN).
Recurrent Neural Network (RNN) is extensively used in Amazon Alexa ,Siri , Google Assistance , a very best example i should say is Google translator . it also extensively used in Time series forecasting.
Recurrent Neural Network (RNN) Architecture-
1. Xi means input , it can be of any dimension.
2. f is Hidden layer which has any no of neurones , it gives output with respect to time.
3. Y^ means final output (Loss function = Y^ -y )
In the above diagram, a chunk of neural network, A, looks at some input xt and outputs a value ht. A loop allows information to be passed from one step of the network to the next.
These loops make recurrent neural networks seem kind of mysterious. However, if you think a bit more, it turns out that they aren’t all that different than a normal neural network. A recurrent neural network can be thought of as multiple copies of the same network, each passing a message to a successor. Consider what happens if we unroll the loop.
Let me explain you in much more easy way
Let's take NLP use case Sentiment Analysis , " The Food IS Bad "
X11 X12 X13 X14
Xi = sentence 1
yi = Output
now lets understand in depth concept , when we apply Recurrent Neural Network our Recurrent Neural Network at each and every time like t=1 preprocess the first word (X11) then at time t=2 preprocess the second word (X12) this will happen till last word of the sentence . Now as we are passing suppose in this example we pass first word (X11) the preprocessing will happen in Hidden layer which has any no of neurones then it will give you output and apart from that the next word , whatever the output for the first word that will also be sent to this hidden layer (neurones) because of this a sequential information is acchieved and maintained.
Recurrent Neural Network Forward Propagation -
Forward propagation means we are moving in only one direction, from input to the output, in a neural network.
w = weight
O1 ,O2 = output 1 , output 2
t = Time
f = Sigmoid function
X11 , X12 ,X13, X14 = words in vector form
This is how output we get in forward propagation , it will be continued till the last word of the sentence and add previous output to the new one.
NOTE - In Forward propagation weight will be constant ( no change ) till the final output.
Recurrent Neural Network Backward Propagation -
In Backward propagation weight will be updated by finding its derivative using chain rule.
This is how weight will get updated by derivative
For all other weight (w) we have to find derivative like this only.
Drawbacks of Simple Recurrent Neural Network -
1. Vanishing Gradient
This problem occurs when we are updating the weight (w) that change in weight is too small that it is said to be almost negligible , therefor it will not reach to the global minima in sigmoid function.
2. The Problem of Long-Term Dependencies
Sometimes, we only need to look at recent information to perform the present task. For example, consider a language model trying to predict the next word based on the previous ones. If we are trying to predict the last word in “the clouds are in the sky,” we don’t need any further context – it’s pretty obvious the next word is going to be sky. In such cases, where the gap between the relevant information and the place that it’s needed is small, RNNs can learn to use the past information.
In theory, RNNs are absolutely capable of handling such “long-term dependencies.” A human could carefully pick parameters for them to solve toy problems of this form. Sadly, in practice, RNNs don’t seem to be able to learn them.
To over come all this Drawbacks we have LSTM (Long Short Term Memory)
Your feedback is appreciated!
Did you find this Blog helpful? Any suggestions for improvement? Please let me know by filling the contact us form or ping me on LinkedIn .