October 25 2022

A Beginner’s Guide To Logistic Regression

Ahmed Raafat MACHINE LEARNING classification, logistic regression 0

You may think that the Logistic Regression algorithm is used for regression or that it’s related to Linear regression. Well, It can be used for regression but actually, it is widely used for classification tasks. It is used to predict categorical variables with the help of dependent variables.

After reading this post you should know the following:

What Is Logistic Regression?
What Are the Types of Logistic Regression?
How does Logistic Regression work?
What are the advantages and disadvantages of using logistic regression?
Logistic Regression vs Linear Regression
Python implementation for Logistic Regression
Resources & References

What Is Logistic Regression?

Logistic regression is defined as a supervised machine learning algorithm that solves binary classification tasks by predicting the probability of an outcome or class. It is used when the data is linearly separable and the outcome is binary.

Some of the examples of classification problems are Normal vs spam Email, Fraud or Normal transactions. And this means we are dealing with a binary outcome.

A binary outcome is one where there are only two possible scenarios, either the event happens (1) or it does not happen (0). Independent variables are those variables or factors which may influence the outcome (or dependent variable).

What Are the Types of Logistic Regression?

Binary logistic regression
Multinomial logistic regression
Ordinal logistic regression

Binary logistic regression

It is the statistical technique used to predict the relationship between the dependent variable (Y) and the independent variable (X), where the dependent variable is binary value (1 or 0). This is the type of logistic regression that we’ve been focusing on in this post. for example:

Spam filter : Normal / Spam
Medical diagnosis : Normal / Abnormal

Multinomial logistic regression

A categorical dependent variable has two or more discrete possible outcomes. It is very similar to logistic regression except that here you can have more than two possible outcomes. for example:

Classifying texts into what language they come from.
Predicting whether a student will go to college by bus or train or car

Ordinal logistic regression

Ordinal logistic regression applies when the dependent variable is in an ordered state and it has more than two possible outcomes. for example:

Formal shirt size : XS, S, M, L, XL
Rating of a movie : Stars 0 to 5

How does Logistic Regression work?

In order to perform Logistic Regression, you have to make sure of some assumptions that apply.

The dependent variable is in discrete binary
Linear relationship between the independent variables
No extreme outliers
No multicollinearity between the independent variables ( They should be independent on each other i.e. low correlation)
Preferably large sample size

The Logistic Regression equation is quite similar to the Linear Regression model. but the Logistic Regression uses a more complex cost function, this cost function can be defined as the Sigmoid function.

As you can see, the sigmoid function returns only values between 0 and 1 ( Can also be called probabilities) for the dependent variable, irrespective of the values of the independent variable. This also works if you have multiple independent variables. The Logistic Regression also model equations between them and one dependent variable.

Unlike Linear regression the best fit curve can’t be determined using Least squares. Instead, methods like maximum likelihood can be used. Likelihood can be said to be the probability of an event based on previous information.

Here y represents the actual class and log(h(x)) is the probability of that class.

When y = 1, the second term becomes 0 and we will check the probability value log(h(x)).
if h(x) is 1, then log(h(x)) is 0 and our cost will be 0. This means our prediction is correct.
if h(x) is 0, then log(h(x)) is ∞ and our cost will be ∞. This means our prediction is incorrect.

But when y = 0, the first term becomes 0 and we will check the probability value of log(h(x)).
if h(x) is 1, then log(1-h(x)) is ∞ and our cost will be ∞. This means our prediction is incorrect.
if h(x) is 0, then log(1-h(x)) is 0 and our cost will be 0. This means our prediction is correct.

What are the advantages and disadvantages of using logistic regression?

Advantage

The main advantage of logistic regression is that it is too easy to set up and train.
Logistic regression performs better when the data is linearly separable.
It does not require too many computational resources.
It does not require tuning.

Disadvantage

Logistic regression fails to predict a continuous outcome
Logistic regression may not be accurate if the sample size is too small or not linear.

Logistic Regression vs Linear Regression

Linear Regression	Logistic Regression
Used to solve regression problems	Used to solve classification problems
Can deal with continuous output	Deals with discrete/categorical output only
It is a straight line	It is a curved line (S shaped)

Python implementation for Logistic Regression

Importing the libraries

import numpy as np
import matplotlib.pyplot as plt
from keras.datasets import mnist
from sklearn.linear_model import LogisticRegression
import seaborn as sns
from matplotlib import rcParams
from sklearn.metrics import confusion_matrix, precision_recall_fscore_support, accuracy_score, roc_curve, auc, roc_auc_score
from sklearn.preprocessing import label_binarize

Importing the dataset

(X_train, y_train), (X_test, y_test) = mnist.load_data()
for i in range(3):  
  plt.subplot(3, 1, i+1)
  plt.imshow(X_train[i])
  plt.show()

Data reshaping

print(f'Before reshaping:\nX_train:{X_train.shape}\nX_test:{X_test.shape}')
X_train = X_train.reshape(X_train.shape[0],-1)
X_test = X_test.reshape(X_test.shape[0],-1)
print(f'\nAfter reshaping:\nX_train:{X_train.shape}\nX_test:{X_test.shape}')

Data normalization

print(f'Before normalization:\nmax value:{X_train.max()}\nmin value:{X_train.min()}')
X_train = X_train / 255.0
X_test = X_test / 255.0
print(f'After normalization:\nmax value:{X_train.max()}\nmin value:{X_train.min()}')

Training the classification model on the Training set

classifier = LogisticRegression()
classifier.fit(X_train, y_train)

Predicting the Test set results

y_pred = classifier.predict(X_test)

Making the Confusion Matrix

classes_num = 10
cm = confusion_matrix(y_test, y_pred)
print(cm)

rcParams['figure.figsize'] = 12,8
sns.set(font_scale = 1.2)
sns.heatmap(cm, fmt='d', annot=True, xticklabels=[0,1,2,3,4,5,6,7,8,9], yticklabels=np.arange(classes_num))

Calculate Precision, Recall, F1_score and Accuracy

average_methods = ['micro','macro','weighted']
for method in average_methods:
  Precision,Recall,F1_score,_ = precision_recall_fscore_support(y_test, y_pred, average=method)
  Accuracy = accuracy_score(y_test, y_pred)
  print(f'{method}:')
  print(f'Precision: {round(Precision*100,2)}%\nRecall: {round(Recall*100,2)}%\nF1_score: {round(F1_score*100,2)}%\nAccuracy: {round(Accuracy*100,2)}%\n')

Resources & References

Read more about machine learning algorithms
Read more about Linear Regression
Full code implementation can be found On Github

Author

Ahmed Raafat

Unstructured Data Scientist

View all posts

A Beginner’s Guide To Logistic Regression

What Is Logistic Regression?

What Are the Types of Logistic Regression?

Binary logistic regression

Multinomial logistic regression

Ordinal logistic regression

How does Logistic Regression work?

What are the advantages and disadvantages of using logistic regression?

Advantage

Disadvantage

Logistic Regression vs Linear Regression

Python implementation for Logistic Regression

Resources & References

Author

Leave a Comment Cancel reply

Subscribe to our newsletter

A Beginner’s Guide To Logistic Regression

What Is Logistic Regression?

What Are the Types of Logistic Regression?

Binary logistic regression

Multinomial logistic regression

Ordinal logistic regression

How does Logistic Regression work?

What are the advantages and disadvantages of using logistic regression?

Advantage

Disadvantage

Logistic Regression vs Linear Regression

Python implementation for Logistic Regression

Resources & References

Author

Related Posts

Introduction to Support Vector Machines (SVMs)

Collaborative Filtering Recommendation System

Activation Functions: All You Need To Know

Leave a Comment Cancel reply

Subscribe to Our Newsletter