Skip to content

Latest commit

 

History

History
341 lines (195 loc) · 23.2 KB

README.md

File metadata and controls

341 lines (195 loc) · 23.2 KB

Training of Feed-Forward Neural Networks

Learning in feed-forward networks belongs to the realm of supervised learning, in which pairs of input and output values are fed into the network for many cycles, so that the network 'learns' the relationship between the input and output.

Let's consider a simple neural network, as shown below.

Forwarding
Figure 1.a : A simple feed-forward neural network

Where,

Node: The basic unit of computation (represented by a single circle)

w: The weight of a connection

i: Input node (the neural network input)

h: Hidden node (a weighted sum of input layers or previous hidden layers)

a_h: Hidden node activated (the value of the hidden node passed to a predefined function)

o: Outut node (A weighted sum of the last hidden layer)

a_o: Output node activated (the neural network output, the value of an output node passed to a predefined function)

E:difference between the output of the network and target value

E_total:total error measured by L2 loss

The following are the steps that execute during training phase of above neural network:

Step 1: Initialization

The first step after designing a neural network is initialization. Initialize all weights W1 through W8 with random values.Also , assume all bias values as zero for simplicity

Step 2: Feed-Forward

In this step, calculate all the values for the hidden layers and output layers and move forward in the network

• Set the values input nodes and targets

Consider input values as i1=0.05, i2=0.1 and target values as t1=0.01,t2 =0.99 throughout this training

• Calculate hidden node values


• Select an activation function.For example, Sigmoid function:


• Calculate hidden node activation values:



• Calculate output node values:


• Calculate output node activation values:



• Calculate the total error which is sum of error E1 contributed by output o1 and error E2 contributed by output o2




After the first pass, the error will be substantial, backpropagation adjusts the weights to reduce the error between the output of the network and the target values

Step 3: Backpropagation

The goal of this step is to incrementally adjust the weights for the network to produce values as close as possible to the target values.Backpropagation can adjust the network weights using the stochastic gradient decent optimization method.


Where,
k : iteration number
η : learning rate

: derivative of the total error with regards to the weight adjusted

The example below shows the derivation of the update formula (gradient) for the weights in the network.

Derivative of the error e with regards to a weight w5 between a_h1 and o1 using the calculus chain rule can be written as follows :


we leave out derivative of E2 with respect to w5 part because the E2 in the network does not depend on weight w5. This is clearly seen in figure 1.a

• Start from the very first activated output node and take derivatives backward for each node.


• From the activated output bounce to the output node:


• From the output node, bounce to the weight of the connection to hidden layer:


• From equations 1,2,3 and 4


•For other similar weights




• Similarly, derivative of the error e with regards to weight between the input and hidden layer W1 using the calculus chain rule can be written as the following


• As in the previous case, start with the very first activated output weight in the network and take derivatives backward all the way to the desired weight, and leave out any nodes that do not affect that specific weight

For simplicity equation 5 can be written as,



From equations 2,3



Also,



Finally, the total derivative for the first weight W1 in our network is the sum of the product the individual node derivatives for each specific path.

From equations 6,7,8,9,10,11


We follow the same procedure for all the weights one-by-one in the network.




Once we have calculated the derivatives for all weights in the network ,we can simultaneously update all the weights in the net with the gradient decent formula, as shown below.


Figure 1.b (refer to source excel here) shows complete training process explained above . We can observe total error E_t reducing with each iteration and output values a_o1 and a_o2 coming closer to target values .

Forwarding
figure 1.b : Training of feed-forward neural network

Effect of learning rate in training

The learning rate controls how quickly the model is adapted to the problem. Smaller learning rates require more training epochs given the smaller changes made to the weights each update, whereas larger learning rates result in rapid changes and require fewer training epochs. A learning rate that is too large can cause the model to converge too quickly to a suboptimal solution, whereas a learning rate that is too small can cause the process to get stuck. Following figures shows effect in error graph learning rate changes from [0.1, 0.2, 0.5, 0.8, 1.0, 2.0]

Forwarding
Figure 1.c : Effect of different learning rates on error