What are Recurrent Neural Networks?
Recurrent Neural Networks (RNNs) are a type of neural network in which the results of one phase are used as inputs for the next. Traditional neural networks have inputs and outputs that are independent of one another. RNNs maintain a memory of previous inputs by utilizing a hidden layer. This enables them to capture dependencies between sequential elements, making them effective for tasks involving time-series or sequential data. After reading this post you should know the following:
- Introduction on Recurrent Neural Networks
- The Architecture of a Traditional RNNs
- Advantages and disadvantages of using RNNs
- How does Recurrent Neural Networks Work?
- Python Implementation steps of a Recurrent Neural Network
- Tips for Improving Model Performance
Introduction to Recurrent Neural Networks
RNN stands for Recurrent Neural Network. It is a type of neural network that is commonly used for processing sequential data, such as time series or natural language text.
In contrast to traditional neural networks, which process inputs one at a time and do not maintain any internal state between inputs, RNNs have loops within them that allow information to persist. In other words, they have a “memory” of previous inputs, which can be used to inform future predictions.
The basic building block of an RNN is the recurrent cell, which takes an input vector and a hidden state vector as inputs, and produces an output vector and a new hidden state vector as outputs. The new hidden state is then fed back into the recurrent cell for the next time step. This allows the RNN to maintain a “memory” of previous inputs and use it to inform predictions.
There are several variants of RNNs, including long short-term memory (LSTM) networks and gated recurrent units (GRUs), which were developed to address some of the limitations of basic RNNs, such as the vanishing gradient problem. These variants use more complex gating mechanisms to control the flow of information within the network, allowing it to learn long-term dependencies more effectively (Figure 1).
The Architecture of Traditional RNNs
Recurrent Neural Networks (RNNs) are a type of neural networks characterized by their inclusion of hidden states, enabling the utilization of prior outputs as inputs. Typically, their architecture facilitates the retention of sequential information through the recurrent connection of hidden states.
RNN architecture can vary depending on the problem you’re trying to solve. From those with a single input and output to those with many (with variations between).
Below are some examples of RNN architectures that can help you better understand this.
- One To One: There is only one pair here. Traditional neural networks use A one-to-one architecture.
- One To Many: A single input in a one-to-many network might result in numerous outputs. One too many networks are used in the production of music, for example.
- Many To One: In this scenario, the model produces a single output by combining many inputs from distinct time steps. Sentiment analysis and emotion identification use such networks, in which the class label is determined by a sequence of words.
- Many To Many: For many to many, there are numerous options. Two inputs yield three outputs. Machine translation systems, such as English to French or vice versa translation systems, use many to many networks.
Advantages and disadvantages of using RNNs
Advantages of using RNN:
- Ability to process sequential data: RNNs are particularly well-suited for processing sequential data, such as time series or natural language text, because they can maintain a “memory” of previous inputs and use it to inform future predictions.
- Flexibility: RNNs can be used for a variety of tasks, including sequence prediction, language translation, and speech recognition.
- End-to-end learning: RNNs can be trained end-to-end, which means that the entire network can be trained to optimize a single objective, rather than requiring separate training of each component.
- Handling variable-length inputs: RNNs can handle variable-length inputs, which makes them well-suited for processing data with varying lengths, such as text of different lengths.
Disadvantages of using RNN:
- Computational complexity: RNNs can be computationally expensive, especially for long sequences, because they require repeated computations for each time step.
- Difficulty in training long sequences: RNNs can be difficult to train for long sequences, because of the vanishing gradient problem, which can make it difficult for the network to learn long-term dependencies.
- Limited memory: RNNs have a limited memory, which means that they may not be able to effectively capture long-term dependencies in some cases.
- Difficulty in parallelization: RNNs are difficult to parallelize, which can limit their scalability on large datasets.
How Does Recurrent Neural Networks Work?
The key idea behind RNNs is to use the output from the previous time step as an input to the current time step. This creates a feedback loop that allows the network to maintain a “memory” of previous inputs, and to use that memory to influence its current output.
The inner Mechanisms of RNNs
At each time step, an RNN takes two inputs: the current input and the previous hidden state. The current input is typically encoded as a vector, and the previous hidden state is a vector that represents the network’s internal state at the previous time step. Using these inputs, the RNN calculates a new hidden state and an output for the current time step.
The calculation of the new hidden state typically involves three components: a linear transformation of the current input, a linear transformation of the previous hidden state, and a non-linear activation function. Typically, we calculate the output for the current time step by performing a linear transformation of the new hidden state, followed by another non-linear activation function.
The parameters of an RNN are learned through a process called backpropagation through time (BPTT), which is a variant of backpropagation that takes into account the sequential nature of the input data. During BPTT, the gradient of the loss function with respect to each parameter is calculated for each time step, and the parameters are updated using gradient descent.
Look at the example below the input layer ‘x’ processes input and passes it to the middle layer ‘h’. The middle layer can have multiple hidden layers with different activation functions, weights, and biases. Recurrent neural networks standardize activation functions and weights, creating one hidden layer and looping over it as needed(Figure1).
Overall, RNNs are a powerful tool for processing sequential data. It have been successfully applied to a wide range of tasks, including language modeling, machine translation, and speech recognition.
Implementation Steps of Recurrent Neural Network
First we import the needed libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import scipy as sp
from sklearn import feature_extraction, model_selection, naive_bayes, metrics, svm
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import precision_recall_fscore_support as score
%matplotlib inline
import nltk
import nltk
from nltk.corpus import stopwords
nltk.download('stopwords')
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('maxent_ne_chunker')
nltk.download('words')
nltk.download('wordnet')
#stop words The most common words used in a text are “the”, “is”, “in”, “for”, “where”, “when”, “to”, “at” etc
from nltk.corpus import stopwords
stop = stopwords.words('english')
from nltk.stem import WordNetLemmatizer
from nltk.stem import PorterStemmer
porter = PorterStemmer()
lemmatizer = WordNetLemmatizer()
from bs4 import BeautifulSoup
import re
# helps in text preprocessing
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.preprocessing.text import Tokenizer
# helps in model building
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import SimpleRNN
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import Embedding
from tensorflow.keras.callbacks import EarlyStopping
# split data into train and test set
from sklearn.model_selection import train_test_split
Load the drug reviews data from Kaggle Data and see the features of the data (Figure 6)
train=pd.read_csv('PATH_TO_YOUR_FILE')
test=pd.read_csv('PATH_TO_YOUR_FILE')
dataset = pd.concat([train, test])
dataset.drop(['uniqueID'],axis =1,inplace=True)
dataset['sentiment'] = dataset['rating'].apply(lambda rating : 1 if rating > 8 else 0)
dataset.head()
dataset.head()
Data Cleaning and Pre-Processing
This code defines a function called review_to_words
that takes in a raw review text and performs several preprocessing step on it to clean and tokenize the text. The function returns the cleaned text as a single string.
Here are the preprocessing steps performed by the function:
- The function uses the BeautifulSoup library to remove any HTML tags from the initial revision text. To accomplish this, call the
get_text()
method on a BeautifulSoup object created from theraw_review
input. - The function removes any non-alphabetic characters (such as numbers and punctuation marks) by replacing them with a space using a regular expression.
- The function converts all characters to lowercase using the lowercase() method and then splits the text into individual words using the Split() method.
- The function removes any stop words (that is, common words that are unlikely to be useful for natural language processing tasks) using a predefined list of stop words called Stop.
- The function lemmatizes each word, and reduces it to its basic form, using WordNetLemmatizer from the NLTK library.
- Finally, the function concatenates the cleaned and lemmatized words back into a single string, which is returned as output.
def review_to_words(raw_review):
# 1. Delete HTML
review_text = BeautifulSoup(raw_review, 'html.parser').get_text()
# 2. Make a space
letters_only = re.sub('[^a-zA-Z]', ' ', review_text)
# 3. lower letters
words = letters_only.lower().split()
# 5. Stopwords
meaningful_words = [w for w in words if not w in stop]
# 6. lemmitization
lemmitize_words = [lemmatizer.lemmatize(w) for w in meaningful_words]
# 7. space join words
return( ' '.join(lemmitize_words))
dataset['review_clean'] = dataset['review'].apply(review_to_words)
Dataset split into Train & Test
X = dataset['review_clean'] .values
y = dataset['sentiment'].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)
The Keras tokenizer to preprocess text data .Here’s a step-by-step breakdown of what’s happening:
- Create a tokenizer object using the Tokenizer class provided by Keras.
- Fit the token into the X_train training data using the fit_on_texts method of the token object. This method updates the internal vocabulary based on the frequency of words in the training data.
- Convert the raw text data in the X_train training set into sequences of integer indexes using the text_to_sequences method of the tokenizer object. This method replaces each word in the text with its corresponding integer index from the token vocabulary.
- Convert the raw text data in the test set X_test into sequences of integer indices using the same text_to_sequences method used for the training data.
- Print the first two sequences of integer indexes from the training data to check the output of the scripts to the sequence.
- Set the max length of the sequence to 8 using the max_length variable.
- Concatenate integer sequences on training and test sets with a fixed length of 8 using the pad_sequences method provided by Keras. This method takes sequences, truncates them if they are longer than max_length, prepends them with zeros if they are shorter, and either appends or appends them based on the value of the padding parameter. In this case, padding=’post’ means adding padding to the end of the sequences.
- Print the pad training sequences to check the pad sequence output.
t = Tokenizer()
t.fit_on_texts(X_train)
encoded_train = t.texts_to_sequences(X_train)
encoded_test = t.texts_to_sequences(X_test)
print(encoded_train[0:2])
max_length = 8
padded_train = pad_sequences(encoded_train, maxlen=max_length, padding='post')
padded_test = pad_sequences(encoded_test, maxlen=max_length, padding='post')
print(padded_train)
Model is Built with vocabulary size as the input size then compiled and summary generated.
# import libraries
from keras.models import Sequential
from keras.layers import Embedding, SimpleRNN, Dense
# define the model
model = Sequential()
model.add(Embedding(input_dim=vocab_size, output_dim=50, input_length=max_length))
model.add(SimpleRNN(units=24, return_sequences=False))
model.add(Dense(units=16, activation='relu'))
model.add(Dense(units=1, activation='sigmoid'))
# compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
# summarize the model
print(model.summary())
Model is trained and validated for test dataset with 20 epochs, Callback is made at an early stage when the validation loss has its first minimum value.
early_stop = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=10)
# fit the model
model.fit(x=padded_train,
y=y_train,
epochs=20,
validation_data=(padded_test, y_test), verbose=1,
callbacks=[early_stop]
)
Defines two functions for evaluating the performance of a classification model using scikit-learn: c_report
and plot_confusion_matrix
.
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
def c_report(y_true, y_pred):
print("Classification Report")
print(classification_report(y_true, y_pred))
acc_sc = accuracy_score(y_true, y_pred)
print("Accuracy : "+ str(acc_sc))
return acc_sc
def plot_confusion_matrix(y_true, y_pred):
mtx = confusion_matrix(y_true, y_pred)
sns.heatmap(mtx, annot=True, fmt='d', linewidths=.5,
cmap="Blues", cbar=False)
plt.ylabel('True label')
plt.xlabel('Predicted label')
Build the model with the vocabulary size as the input size, then compile it and generate a summary.
preds = (model.predict(padded_test) > 0.5).astype("int32")
c_report(y_test, preds)
plot_confusion_matrix(y_test, preds)
Tips for Improving Model Performance:
- Use different types of RNNs: There are different types of RNNs such as Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and Simple RNN. Experimenting with different types of RNNs can help to find the one that works best for your specific task.
- Adjust the architecture of your RNN: Experimenting with the depth and width of your RNN can help to improve its performance. Increasing the number of layers or neurons can help to increase the capacity of the model, but be careful not to overfit.
- Use attention mechanisms: Attention mechanisms can help to improve the performance of RNNs on tasks such as machine translation and image captioning by allowing the model to selectively focus on different parts of the input.
- Use regularization techniques: Regularization techniques such as dropout and weight decay can help to prevent overfitting in RNNs and improve their generalization performance.
- Gradient clipping: Gradient clipping is a technique that can help to prevent exploding gradients in RNNs. It involves scaling the gradients if they exceed a certain threshold.
- Use batch normalization: Batch normalization can help to stabilize RNNs and improve its training speed and performance.
Resources:
Full source code on Github
References: