# Introduction to GANs: Generative Adversarial Networks

GANs which stand for Generative Adversarial Networks were proposed in a 2014 paper by Ian Goodfellow. The idea got researchers and Deep Learning enthusiasts excited almost instantly, but then slowed down due to the difficulties of training GANs. Which we will discuss later in this article.

In this article you will discover the Fundamentals behind Generative Adversarial Networks for image Generation. After reading this post you will know the following:

- What are GANs
- Generator and Discriminator roles inside of a GAN
- Training phases for a GAN network
- Code Sample to train a simple GAN to generate Architectural buildings
- The Challenges of Training GANs
- Recourses & References

### What are GANs anyways?

In Hindsight, the idea seems rather simple, Two neural networks competing against each other (thus “adversarial”) in the hope that this will push them to excel. A Generative Adversarial Networks is composed of the following:

**Generator **Network: Takes as input a random seed (Typically Gaussian Distribution) and output data, which can be represented as an image. The input is refered to as the latent representation (i.e., codings) of the generated image.

**Discriminator **Network: Which takes in Both fake images from the generator, and real images form the training set. And then try and differentiate which is fake and which is real.

During training, The generator and the discriminator have opposing goal, The generator is trying to produce images that look almost real to trick the discriminator, while the discriminator is trying to tell fakes images from real ones. Because of this setup, GANs which consist of 2 Neural networks can’t be trained like any other neural network. Each training iteration must be divided into two phases as follows:

### Training Phases for a Generative Adversarial Network

**Phase One: **We train the discriminator only. Meaning back-propagation only optimizes the weights of the discriminator in this phase. The discriminator is fed a batch of images from the training set alongside an equal number of generated fake images from the generator. which at first will be pretty much noise. And improve with time as you’ll see later on in the article. The labels of the training images will be set to 1, while generated images will be set to 0. The discriminator is then trained on this labeled batch using the binary cross-entropy loss function.

**Phase Two: **We train the generator only, Meaning back-propagation only optimizes the weights of the generator in this phase. First we use the generator to produce another batch of fakes images, then feed it to the discriminator to tell weather they are real or fake. This time we don’t include real images inside the batch, and set all labels to 1 (real). To put it simply: We want the generator to be able to generate images the discriminator will wrongly believe to be real. It is crucial to note that during this phase the discriminator weights are frozen, Meaning it won’t be affected whatsoever.

**Notice**: The generator was never actually fed with any real images! Yet with this setup, it will gradually be able to learn and generate convincing near real images. Just using the gradients flowing back from the discriminator. Which means the better the discriminator gets, the more information the generator gets about real images.

### Code: Lets build a simple Generative Adversarial Network!

Lets go ahead and build a simple but effective GAN to generate Architectural buildings.

Lets start by setting up some hyper-parameters and important variables:

```
GENERATE_RES = 4
GENERATE_SQUARE = 32 * GENERATE_RES
IMAGE_CHANNELS = 3
SEED_SIZE = 100
EPOCHS = 1000
BATCH_SIZE = 32
BUFFER_SIZE = 60000
```

Next, We’ll build the two Networks we need, starting with the generator. Which is more of an auto-encoder’s decoder:

```
def build_generator(seed_size, channels):
model = Sequential()
model.add(Dense(4*4*256,activation="relu",input_dim=seed_size))
model.add(Reshape((4,4,256)))
model.add(UpSampling2D())
model.add(Conv2D(256,kernel_size=3,padding="same"))
model.add(BatchNormalization(momentum=0.8))
model.add(Activation("relu"))
model.add(UpSampling2D())
model.add(Conv2D(256,kernel_size=3,padding="same"))
model.add(BatchNormalization(momentum=0.8))
model.add(Activation("relu"))
# Output resolution, additional upsampling
model.add(UpSampling2D())
model.add(Conv2D(128,kernel_size=3,padding="same"))
model.add(BatchNormalization(momentum=0.8))
model.add(Activation("relu"))
if GENERATE_RES>1:
model.add(UpSampling2D(size=(GENERATE_RES,GENERATE_RES)))
model.add(Conv2D(128,kernel_size=3,padding="same"))
model.add(BatchNormalization(momentum=0.8))
model.add(Activation("relu"))
# Final CNN layer
model.add(Conv2D(channels,kernel_size=3,padding="same"))
model.add(Activation("tanh"))
return model
```

The discriminator is just a regular binary classifier that takes images and output a single number using a sigmoid activation function:

```
def build_discriminator(image_shape):
model = Sequential()
model.add(Conv2D(32, kernel_size=3, strides=2, input_shape=image _shape, padding="same"))
model.add(LeakyReLU(alpha=0.2))
model.add(Dropout(0.25))
model.add(Conv2D(64, kernel_size=3, strides=2, padding="same"))
model.add(ZeroPadding2D(padding=((0,1),(0,1))))
model.add(BatchNormalization(momentum=0.8))
model.add(LeakyReLU(alpha=0.2))
model.add(Dropout(0.25))
model.add(Conv2D(128, kernel_size=3, strides=2, padding="same"))
model.add(BatchNormalization(momentum=0.8))
model.add(LeakyReLU(alpha=0.2))
model.add(Dropout(0.25))
model.add(Conv2D(256, kernel_size=3, strides=1, padding="same"))
model.add(BatchNormalization(momentum=0.8))
model.add(LeakyReLU(alpha=0.2))
model.add(Dropout(0.25))
model.add(Conv2D(512, kernel_size=3, strides=1, padding="same"))
model.add(BatchNormalization(momentum=0.8))
model.add(LeakyReLU(alpha=0.2))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(1, activation='sigmoid'))
return model
```

Next, we define Discriminator loss & Generator loss along with their optimizes:

```
def discriminator_loss(real_output, fake_output):
real_loss = cross_entropy(tf.ones_like(real_output), real_output)
fake_loss = cross_entropy(tf.zeros_like(fake_output), fake_output)
total_loss = real_loss + fake_loss
return total_loss
def generator_loss(fake_output):
return cross_entropy(tf.ones_like(fake_output), fake_output)
generator_optimizer = tf.keras.optimizers.Adam(1.5e-4,0.5)
discriminator_optimizer = tf.keras.optimizers.Adam(1.5e-4,0.5)
```

Next we define the training step function, Since training a GAN is very different from a regular Neural Network as we mentioned before, We cannot use the regular .fit() method. Instead we have to use a custom loop to train a GAN. This training function is inspired from Tensorflow’s Documentation:

```
@tf.function
def train_step(images):
seed = tf.random.normal([BATCH_SIZE, SEED_SIZE])
with tf.GradientTape() as gen_tape, tf.GradientTape() as disc_tape:
generated_images = generator(seed, training=True)
real_output = discriminator(images, training=True)
fake_output = discriminator(generated_images, training=True)
gen_loss = generator_loss(fake_output)
disc_loss = discriminator_loss(real_output, fake_output)
gradients_of_generator = gen_tape.gradient(\
gen_loss, generator.trainable_variables)
gradients_of_discriminator = disc_tape.gradient(\
disc_loss, discriminator.trainable_variables)
generator_optimizer.apply_gradients(zip(
gradients_of_generator, generator.trainable_variables))
discriminator_optimizer.apply_gradients(zip(
gradients_of_discriminator,
discriminator.trainable_variables))
return gen_loss,disc_loss
```

Now the main training function that will do it all:

```
def train(dataset, epochs):
fixed_seed = np.random.normal(0, 1, (PREVIEW_ROWS * PREVIEW_COLS, SEED_SIZE))
for epoch in range(epochs):
gen_loss_list = []
disc_loss_list = []
for image_batch in dataset:
t = train_step(image_batch)
gen_loss_list.append(t[0])
disc_loss_list.append(t[1])
g_loss = sum(gen_loss_list) / len(gen_loss_list)
d_loss = sum(disc_loss_list) / len(disc_loss_list)
print (f'Epoch {epoch+1}, gen loss={g_loss},disc loss={d_loss},'\
' {hms_string(epoch_elapsed)}')
```

### Results

In the following video, You will observe the Generator’s journey through 800 Epochs, from Random Noise to very realistic images of Architectural buildings:

Here’s a sample of the Generator’s results:

### The Challenges of Training Generative Adversarial Networks

Because the generator and the discriminator during training are constantly pushing against each other, trying to outsmart each other. The training may end up in what game theorists call a Nash Equilibrium. This in game theory is when two players reach a stage in the game where no player would be better off changing their strategy. Meaning there’s no incentive to deviate from what you’re already doing, assuming the other player would do the same. Different initial states and opening strategies may lead to one equilibrium or the another.

In GANs, the game between the generator and the discriminator battling at each other can only have one Nash Equilibrium. This is reached when the generator can produce near perfect images, and the discriminator is forced to guess at a 50% of getting it right. This stage is very encouraging, one would argue that you only need to train a GAN long enough to reach it’s Nash Equilibrium and you’d get a perfect generator. Unfortunately that’s not the case, since nothing guarantees that a Nash Equilibrium will ever be reached.

Usually training may begin properly, then suddenly diverge because the generator and the discriminator’s parameters become unstable for no specific reason. Due to these instabilities, GANs are very sensitive to hyperparameters. Fine tuning these hyperparameters may take a lot of effort.