Introduction To Deep Learning
Deep learning, a subset of artificial intelligence (AI), deep learning has become a powerful technology driving transformative changes across industries. it mimics the way the human brain processes information, enabling computers to learn from vast amounts of data. Deep learning revolutionized various fields, including computer vision, natural language processing, speech recognition, healthcare, finance, and autonomous vehicles.
Reading this article you will understand:
- What is deep Learning
- Building Blocks of Deep Learning
- Activation Functions’ role in neural networks
- Types of Deep Learning Networks
What is Deep Learning?
At the heart of deep learning lies the artificial neural network. A computational model inspired by the structure and function of the human brain. These networks are composed of layers of interconnected nodes, or “neurons,” and they learn by adjusting the strengths of connections between these nodes based on the input data.
The term “deep” refers to the multiple layers in these neural networks. These deep architectures allow the networks to learn increasingly complex representations of the data as it passes through each layer, capturing higher-level abstractions and features.
Neurons, the Building Block of Deep Learning
In artificial neural networks, neurons are simplified mathematical models inspired by the neurons in the human brain. Each neuron takes input signals, processes them, and produces an output signal.
The Work of Neurons
Step | Description |
---|---|
1-Input: | Neurons receive input from other neurons or from the external data being fed into the network. These inputs are numerical values representing features of the data, such as pixel values in an image or words in a sentence. |
2-Weights: | Each input is multiplied by a weight, which determines the importance of that input. These weights are parameters that the neural network learns during training, adjusting them to improve its performance on the given task. |
3-Summation: | The weighted inputs are summed together with an additional bias term, which allows the neuron to learn a threshold for activation. |
4-Activation Function: | The sum of the weighted inputs plus the bias is then passed through an activation function, which introduces non-linearity into the neuron’s output. This activation function decides whether the neuron should “fire” (i.e., produce an output signal) based on the input it received. |
5-Output: | The output of the neuron, often referred to as its activation or activation value, is then passed to the neurons in the next layer of the network. |
(1)
Fully connected Neural Network
These neurons are stacked in layers to form a neural network:
- Input Layer: This is the first layer of the neural network, where the input data is fed into the network. Each neuron in the input layer represents a feature of the input data.
- Hidden Layers: These are one or more layers between the input and output layers. Each neuron in a hidden layer receives inputs from the neurons in the previous layer and passes its output to the neurons in the next layer. These hidden layers enable the network to learn increasingly complex representations of the input data.
- Output Layer: This is the final layer of the neural network, where the network produces its output. The number of neurons in the output layer depends on the task the network is designed for.
Weights of the Neurons
The weight matrix represents the parameters that the neural network learns during training. Each neuron in a neural network has associated weights that determine the strength of its connections to neurons in the previous layer. These weights are adjusted during the training process to minimize the difference between the predicted outputs of the network and the true outputs (i.e., the loss).
The weight matrix essentially defines the transformation that the input data undergoes as it passes through the network. By adjusting these weights, the network learns to extract relevant features from the input data and make accurate predictions or classifications.
Activation Functions
In a neural network without activation functions, each neuron performs a linear transformation of its inputs by computing a weighted sum of the input values:
Here, ๐ค๐ represents the weights associated with each input, ๐ฅ๐ represents the input values, and ๐ represents the bias term.
The output is a linear combination of the input values, meaning that the relationship between the inputs and the output is linear.
By applying a non-linear function ๐(โ ) to the output of each neuron, the relationship between the inputs and the output becomes non-linear: ๐ฆ=๐(๐ง)
Benefit of Activation Function
- This non-linear activation function introduces curvature and complexity to the network’s decision boundary, enabling it to capture non-linear patterns and dependencies in the data.
- Without activation functions, neural networks would be limited to representing only linear mappings between inputs and outputs, severely restricting their expressiveness and representational power.
- Non-linear activation functions allow neural networks to approximate arbitrary functions, making them capable of modeling highly complex relationships in the data.
- As the depth and complexity of neural networks increase, the importance of non-linear activation functions becomes even more pronounced, enabling the network to learn increasingly intricate mappings.
1- Sigmoid Activation Function
Sigmoid functions squash the output between 0 and 1, making them suitable for binary classification tasks where the output represents probabilities. They are smooth and differentiable everywhere, allowing for gradient-based optimization methods like gradient descent. Historically used in early neural networks for their simple mathematical form.
(2)
Disadvantages:
- Sigmoids suffer from the vanishing gradient problem, where gradients become very small for extreme input values, leading to slow learning or stagnation during training.
- They are not zero-centered, making optimization less efficient, especially when dealing with large-scale datasets
2- TanH Activation Function
(3)
Tanh functions squash the output between -1 and 1, providing stronger gradients compared to sigmoid functions. They are zero-centered, which helps optimization algorithms converge faster.
Disadvantages:
- Like sigmoid functions, tanh functions are also susceptible to the vanishing gradient problem for extreme input values.
3. Rectified Linear Unit (ReLU):
(4)
ReLU functions are computationally efficient and easy to implement, as they simply output the input value if it’s positive, and zero otherwise. They address the vanishing gradient problem by providing non-zero gradients for positive input values, enabling faster learning. ReLU activations have been shown to induce sparsity in neural networks, which can help with regularization and model interpretability.
Disadvantages:
ReLU units can suffer from the “dying ReLU” problem where neurons become inactive (output zero) for negative input values during training and never recover, leading to dead neurons and reduced model capacity.
Why Stacking layers instead of stacking Neurons?
Stacking layers instead of adding more neurons to a single layer is a fundamental design choice in neural network architecture, and it’s motivated by several factors:
1- Hierarchical Representation:
Neural networks with multiple layers can learn hierarchical representations of the input data. Each layer captures different levels of abstraction or features from the input, starting from low-level features in the early layers to high-level features in the deeper layers. By stacking layers, the network can gradually transform the raw input into more abstract and meaningful representations, enabling it to capture complex patterns and relationships in the data.
2- Feature Reusability:
Layers allow for the reuse of learned features across different parts of the network. The output of neurons in one layer serves as input to neurons in the subsequent layer, allowing the network to build upon previously learned representations.
3- Compositional Learning:
Layered architectures enable compositional learning, where each layer learns simple transformations or patterns that are combined to learn more complex concepts. By decomposing the learning problem into multiple layers, the network can focus on learning local relationships within each layer while abstracting away higher-level concepts in deeper layers.
4- Non-Linearity:
Introducing multiple layers allows for the insertion of non-linear activation functions between layers, which is essential for capturing complex relationships in the data. Non-linear activation functions enable the network to approximate arbitrary functions, making it more expressive and capable of modeling complex, non-linear mappings between inputs and outputs.
Types of Deep Learning Networks
Deep learning encompasses a variety of architectures and algorithms designed to learn from data by leveraging neural networks with multiple layers. Here are some common types of deep learning approaches:
Feedforward Neural Networks (FNNs):
Feedforward neural networks, also known as multilayer perceptrons (MLPs), are the simplest form of deep learning models. They consist of multiple layers of neurons, with connections only flowing forward from the input layer to the output layer. FNNs are commonly used for tasks such as regression, classification, and function approximation.
Convolutional Neural Networks (CNNs):
Convolutional neural networks are specialized deep learning architectures designed for processing grid-like data, such as images and videos. They leverage convolutional layers to automatically learn hierarchical representations of features directly from raw pixel data. CNNs are widely used in computer vision tasks, including image classification, object detection, and image segmentation.
Recurrent Neural Networks (RNNs):
Recurrent neural networks are designed to process sequential data with temporal dependencies, such as time series, text, and speech. They incorporate feedback connections that allow information to persist over time, enabling the network to capture long-term dependencies. RNNs are used in tasks such as language modeling, machine translation, speech recognition, and time series prediction.
Long Short-Term Memory Networks (LSTMs):
Long Short-Term Memory networks are a type of recurrent neural network designed to address the vanishing gradient problem and capture long-term dependencies more effectively. LSTMs use specialized memory cells with gating mechanisms to regulate the flow of information over time. They are particularly effective for tasks that require modeling long-range dependencies, such as machine translation, speech recognition, and sentiment analysis.
Generative Adversarial Networks (GANs):
Generative adversarial networks consist of two neural networks, a generator and a discriminator, trained simultaneously in a competitive manner. The generator learns to generate realistic samples from random noise, while the discriminator learns to distinguish between real and fake samples. GANs are used for tasks such as image generation, image-to-image translation, and data augmentation.
Autoencoders:
Autoencoders are neural network architectures designed for unsupervised learning and dimensionality reduction. They consist of an encoder network that compresses the input data into a lower-dimensional representation (latent space) and a decoder network that reconstructs the original input from the latent representation. Autoencoders are used for tasks such as data denoising, feature learning, and anomaly detection.
Graph Neural Networks (GNNs):
These problems (Fig 2) could be tackled by different types of models and theories that have been developed based on the Convolution Neural Network idea, among these models are:
- Graph Convolutional Neural Network GCNNs.
- Attention Graph Neural Network.
- Message Passing Neural Network.