In artificial neural networks, activation function plays an important role in determining the output of the neuron. To make it sound more realistic, we can simply compare the activation function to the biological neurons which fire the signal to other connected neurons.
Basically, the activation function maps an output value between the value 0 and 1. Guess Why?, Because the neuron has a value which is not confined to a scale, making it unclear when to fire. Thus this value from neurons is mapped to a scale to understand as in when to fire (as in case of biological neurons).
Considering the artificial neuron as a simple linear function or affine function.
Where w is the weight of a neuron, x is an input and b is bias value. Now as mentioned above, y is not bound to scale.
Now comes the role of the activation function. As this maps to scale between 0 and 1. Thus we know whether the neuron is “activated” or “not activated”.
Further, the linear function can be of 2 types
- Linear activation function.
- Nonlinear activation function.
Linear activation function
This function does not make any difference to the mapping of output and most of the real-world problems are nonlinear in nature, therefore it is not used in deep learning applications.
Nonlinear activation function
Nonlinear activation functions are widely used in the deep learning application, as this serves the purpose of confining the values to the specific range in general case 0 to 1 and allows the model to perform complex mappings, which are an essential part of learning and modelling.
In non-linear, there are majorly 5 types which can be application-specific.
Unit step function
A unit step function or Heaviside step function is a simple function which maps positive values to 1 and negative values to 0.
A sign function maps the positive value to +1 and negative function to -1
It is an S-shaped monotonic nonlinear function which maps +ve value from +0.5 to +1 and -ve value from -0.5 to -1. This is widely used in shallow neural network applications. 0<ϕ(a)<1
Furthermore, it has a couple of interesting properties which makes them widely used. One among them is it is symmetric in nature. i.e ϕ(-a)=1-ϕ(a)reducing computational complexity.
All these properties sound great, But wait!. There is a problem with this function.
Observing the above figure shows that, as the value saturates, it is not responsive with respect to value a, as it destroys gradients (curve is parallel to the x-axis) which is termed as “Vanishing Gradient problem ”. This makes the neural network learning rate drastically slow. To avoid this problem, we opt for other functions.
Hyperbolic tangent (tanh) function
Another activation function which is also widely used is tanh function. It is similar to the sigmoid function, but with different output range. It is the ratio of hyperbolic sine and hyperbolic cosine functions which is zero centred
ReLU (Rectified Linear Unit)
ReLU is one of the most used functions in deep learning applications. Especially in the Convolutional neural network.
As the equation explains, it is linear for all positive values and zeroes for all the negative values. Linearity in positive values states that the slope does not saturate when a is larger, thus it doesn’t have a vanishing gradient problem as we saw in sigmoid and tanh activation function. Moreover, due to simple implementation, the computational complexity is decreased significantly, therefore yielding less training time.
Even though the function seems perfect, it has a problem which is known as dying ReLU. This is when the neuron generates negative values making ReLU function to give 0 thus making it unlikely to recover from the negative slope. This makes the neuron dead and useless.
To avoid this problem we use an activation function called Leaky ReLU