June 3 2023

Text-to-Image Generation: Unleashing the Power of DALL-E 2 and DALLE mini

Nora Yehia COMPUTER VISION CLIP, DALLE, Decoder, Encoder, Image Generation, Text-to-Image, Unclip 0

In a groundbreaking paper conducted on April 13, 2022, titled “Hierarchical Text-Conditional Image Generation Using CLIP Latent Paper” [1] Ramesh, Aditya, et al. presented compelling evidence for the effectiveness of contrastive models such as CLIP in acquiring robust image representations that encompass both style and semantics. The researchers proposed a two-stage model comprising a prior, which generates a CLIP image embedding based on a given text caption, and a decoder that utilizes the image embedding to generate a corresponding image. Central to its functionality is the concept of stable diffusions, which enables the generation of high-quality images by iteratively refining the output of the model. This innovative AI system demonstrates the capability to create authentic and realistic visuals and artwork by leveraging natural language descriptions as a creative foundation. The system excels at blending ideas, qualities, and stylings to produce compelling outputs. Importantly, the advancements made in DALL-E 2 and DALLE mini are significant developments in the field of text-to-image generation. After reading this article you should know the following:

What is DALL·E 2
How DALL·E 2 works
Python implementation of DALL·E 2
What is DALL-E mini?
Python implementation of DALL-E mini

So What is DALL·E 2 you may ask?

The updated version of DALL-E, a generative language model that takes sentences and generates original visuals in response, is called DALL-E 2. DALL-E2 is a huge model with 3.5B parameters, yet it is interestingly smaller than GPT-3 and not quite as massive as DALL-E (12B). Despite its size, DALL-E 2 produces photos with a 4x higher resolution than DALL-E, and human judges favors it +70% of the time for caption matching and photorealism (Figure 1).

Figure 1: Dall-E 2 produces a unique image in response "An astronaut riding a horse in photorealistic style" — Figure 1: Dall-E 2 produces a unique image in response “An astronaut riding a horse in photorealistic style”

How DALL·E 2 works?

The CLIP model takes image-caption pairs and creates “mental” representations in the form of vectors, called text/image embedding (figure 2, top of the dotted line), then the Prior model that takes a caption/clip text embedding and generates clip image embedding, then with Decoder Diffusion model (unclip) takes a clip image embedding and generates images.

Figure 2: DALL·E 2 is Combination of prior + diffusion decoder (unCLIP) models

DALLE 2 is a specific example of a previous, decoder-based, two-part model (figure 2, below the dotted line). We can convert a statement into an image by concatenating both models.

It’s interesting to note that the decoder is called unclip since it uses the opposite procedure from the original clip model to create an original picture from a general mental representation rather than an image.

To make it easier, imagine that you want to draw something. What do you do? You imagine its image in your brain, and then after that, you start drawing its likeness. This is simply what is done. The main characteristics that are semantically meaningful are encoded in the mental representation include people, animals, objects, styles, colors, etc. DALLE 2 can build a new image while maintaining these characteristics and modifying the non-essential elements .

Python implementation of DALL-E 2

First, let’s Install openai’s library

pip install openai

Import all needed Libraries:

import os
import sys
import urllib.request
import json
import base64
import IPython.display
import IPython.core.display
import IPython.core.displaypub
import openai

Then let’s get started with our prompt! What do we want to generate?


prompt = "Lightning hits a river, electric, dark cloudy sky"

Obtain your API keys from your OpenAI account and use them as follows:

openai.api_key = "<YOUR_KEY>"

response = openai.Image.create(
 prompt= prompt,
 n = 1,
 size = "1024x1024"
)
image_url = response['data'][0]['url']
image_url

OpenAI API will return an image URL, let’s display that image:

def insert_image(url, caption):
 # get the image
 image = urllib.request.urlopen(url).read()
 # encode the image
 image = base64.b64encode(image).decode('utf-8')
 # create the html
 html = f"""<img src="data:image/png;base64,{image}" alt="{caption}" 
title="{caption}" />"""
 # display the image
 IPython.display.display(IPython.display.HTML(html))
 # create the caption
 IPython.display.display(IPython.display.Markdown(f"""# {caption}"""))
insert_image(image_url, prompt)

Sample DALL-E 2 outputs for prompt: “Lightning hits a river, electric, dark cloudy sky”

What is DALL-E mini?

DALL-E mini, initially conceived by Boris Dayma, a computer expert based in Texas, emerged as a submission for a coding contest. This application draws inspiration from the formidable DALL-E, developed by the artificial intelligence startup OpenAI, hence the shared name (Figure 4). DALL-E mini, also known as Craiyon, serves as a web application that offers a more user-friendly approach while employing similar underlying technology. Although Dayma openly accessible model is available to anyone online and Dayma developed it in collaboration with the AI research communities on Twitter and GitHub,

Python implementation of DALL-E mini

Let’s start by installing min-dalle library:

! pip install min-dalle -q

Import MinDalle:

from min_dalle import MinDalle

Define your model, and your prompt:

model = MinDalle(is_mega=True, is_reusable=True)

prompt = "Dogs playing" 
seed = 6 
grid_size = 2 

display(model.generate_image(text, seed, grid_size))

Sample Craiyon outputs for prompt: “Dogs Playing”

Figure 4: An image of a dogs playing generated by DALL-E Mini.

Resources & References:

Author

Nora Yehia

Data scientist

View all posts

Text-to-Image Generation: Unleashing the Power of DALL-E 2 and DALLE mini

So What is DALL·E 2 you may ask?

How DALL·E 2 works?

Python implementation of DALL-E 2

What is DALL-E mini?

Python implementation of DALL-E mini

Resources & References:

Author

Leave a Comment Cancel reply

Subscribe to our newsletter

Text-to-Image Generation: Unleashing the Power of DALL-E 2 and DALLE mini

So What is DALL·E 2 you may ask?

How DALL·E 2 works?

Python implementation of DALL-E 2

What is DALL-E mini?

Python implementation of DALL-E mini

Resources & References:

Author

Related Posts

Image Generation Using Stable Diffusion

Leave a Comment Cancel reply

Subscribe to Our Newsletter