Exploring Python and the OpenAI API for Generating Images with DALL.E 2

Exploring Python and the OpenAI API for Generating Images with DALL.E 2

OpenAI's DALL-E 2 is a generative model that uses natural language to generate images based on a given description. Created by OpenAI, the model is trained on millions of images, text, and audio clips to generate creative artwork. It can quickly generate high-quality art in a wide range of styles and forms, including landscapes, abstracts, and portraits. It is a go-to choice for AI-generated art, offering a variety of creative and interesting possibilities. Additionally, the model is constantly evolving and improving, allowing for more sophisticated and detailed artwork to be produced.

Generative models have gained popularity in recent years for their ability to produce interesting, compelling, and original results in text, art, and image generation. The rise of generative models is attributed to advances in deep learning and access to large datasets, and they are being used in creative, scientific, and commercial applications with the potential to revolutionize many fields.

This article will teach you about OpenAI’s DALL-E2, how to generate images from text prompts using the OpenAI Python package, How to generate images from API calls, how to create variations of a generated image, and finally how to decode a base64 encoding of a generated image.

Prerequisites

To follow along, it will be nice if you have the following:

  • Python installed on your computer

  • Basic knowledge of Python programming

  • Basic knowledge of how to work with JSON data

All the code examples used in this article can be found in this GitHub https://github.com/Afeez1131/openAI-image-generation.

DALL-E2 and How it works

DALL-E2 is a system for text-to-image generation developed at Open AI. When prompted with a caption, the system will attempt to generate an image that matches the prompt provided. It also has additional capabilities such as performing edits to an image using the prompt, generating new images that share the same essence as a given reference image, but differ in how the details are put together and transforming any aspect of an image using the prompt. The system underlying DALL-E2 is based on two key technologies: CLIP (Contrastive Language-Image Pre-training) and diffusion.

Contrastive Language-Image Pre-training (CLIP) is a contrastive model that tries to match an image with its corresponding caption. It consists of an image encoder and a text encoder that is trained on a large diverse collection of image-text pairs. An image encoder encodes the images into image embeddings and the text encoder encodes text into text embeddings. The major goal of CLIP is to maximize the similarity score between the image embedding and its caption (text) embedding. CLIP is important to DALL-E2 because it is what ultimately determines how semantically related a natural language text is to a visual concept, which is critical for text-conditional image generation.

Diffusion is a technique to train a generative model that takes a type of data (say an image) and gradually adds Gaussian noise to it until the data (image) is no longer distinguishable from pure noise through a process called the corruption process. Once that is achieved, the diffusion model reverses the corruption process and tries to reconstruct the image to its initial state.

“DALL·E2 generates images in two stages; in the first stage, the prior model generates a CLIP image embedding (intended to describe the “gist” of the image) from the given caption. And in the second stage, a diffusion model generates the image itself from this embedding. The diffusion model is called unCLIP because it effectively reverses the mapping learned by the CLIP image encoder. Since unCLIP is trained to fill in the details necessary to produce a realistic image from the embedding, it will learn to model all of the information that CLIP deems irrelevant for its training objective and hence discards.” - the Dalle docs.

You can learn more about how DALL.E works from this article

Getting Started

In this section, you will start by creating an isolated environment. Next, you will install the needed Python packages which include the openAi which provides convenient access to the OpenAI API from applications written in the Python language, and also the requests which allow you to send HTTP requests easily. Finally, you will generate your openAI API key, and save it as an environment variable.

Creating a Virtual Environment

A virtual environment is a Python tool that allows you to manage the dependency of a project. They allow Python packages to be installed locally in an isolated directory for a particular project, as opposed to being installed globally on your computer.

Create a virtual environment:

# python3 -m venv /path/to/new/virtual/environment

python –m venv venv

Activate the virtual environment and install the needed Python packages:

# on windows
$: source venv/Scripts/activate

# on linux
$: source venv/bin/activate

Install the needed python packages:

(venv) $ : pip  install  openai

(venv) $ : pip  install  requests

Getting the OpenAI API Key and Saving it as Environmental Variable

You need the API KEY to make an API call. Sign up for an OpenAI account, and generate your API KEY on this page. Once you generate your API KEY, you should have a page similar to:

Click on Create new secret key to create a new API key, and copy the value as shown in the image above:

Get the OpenAI API key and save it as an environmental variable:

$ (venv): export OPENAI_API_KEY ="your_openAI_key”

The above command makes the API key accessible in your current terminal session, with an alias of OPENAI_API_KEY.

NOTE: once you close the terminal, the key will no longer be accessible

If you signed up with OpenAI’s API, you’ll benefit from the free trial that allows you to use $18 of free credits within your first three months

Generating Images using the OpenAI Python Library

The OpenAI Python library provides convenient access to the OpenAI API from applications written in the Python language. It includes a predefined set of classes for API resources that initialize themselves dynamically from API responses which makes it compatible with a wide range of versions of the OpenAI API.

This library additionally provides a command-line utility, which makes it easy to interact with the API from your terminal, and a Software Development Kit (SDK).

Generating Images with the OpenAI Command Line Utility.

The OpenAI Python library provides a command line utility that makes it easy to interact with the API from your terminal

The image generations endpoint allows you to create an original image given a text prompt.

The format of the command to generate an image with the OpenAI command line utility is as shown below:

openai api image.create -p "your prompt (image description)"
$ (venv): openai api image.create -p "caricature of the flash"

This command will send a request to OpenAI's image generation endpoint and create an image from the text prompt you provided ("carricature of the flash"). You will receive a JSON response that contains the URL to the image API call generated for you:

{
"created": 1674111546,
"data": [
{
"url": "https://oaidalleapiprodscus.blob.core.windows.net/private/org-9LGdDgtWNf9H3ahrXtMGxLeE/user-ZGMS19ezW8ZGryPVOaroIQkc/img-cFWuQNXOmsbGJ7VNjB803prh.png?st=2023-01-19T05%3A59%3A06Z&se=2023-01-19T07%3A59%3A06Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2023-01-19T01%3A11%3A42Z&ske=2023-01-20T01%3A11%3A42Z&sks=b&skv=2021-08-06&sig=IWNmpQSzC31CQ3EZUKO5R76xDQW3SNnsFIhuAHCnLmo%3D"
}
]}

Copy the URL into a browser, and you will have your generated image:

Note: All generated image URLs will only be valid for one Hour.

Generating Multiple Images

To generate multiple images, follow the same procedure above and add an argument n with a value between 1-10 to your command:

$ (venv): openai api image.create -p "caricature of the flash" -n '2'

The above command will generate and return the URL for the two images:

{
"created": 1674111872,
"data": [
{
"url":"https://oaidalleapiprodscus.blob.core.windows.net/private/org-9LGdDgtWNf9H3ahrXtMGxLeE/user-ZGMS19ezW8ZGryPVOaroIQkc/img-hUY9PJplwAvfHv8kf5YNzgdF.png?st=2023-01-19T06%3A04%3A32Z&se=2023-01-19T08%3A04%3A32Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2023-01-19T01%3A14%3A54Z&ske=2023-01-20T01%3A14%3A54Z&sks=b&skv=2021-08-06&sig=iy3YkjSQ4RmszlotLyV4EZs9TY6HIqdo7lvGNxkq%2Bq8%3D"
},
{
"url": "https://oaidalleapiprodscus.blob.core.windows.net/private/org-9LGdDgtWNf9H3ahrXtMGxLeE/user-ZGMS19ezW8ZGryPVOaroIQkc/img-R5qh7run4IiDLUOgG0iuApg2.png?st=2023-01-19T06%3A04%3A32Z&se=2023-01-19T08%3A04%3A32Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2023-01-19T01%3A14%3A54Z&ske=2023-01-20T01%3A14%3A54Z&sks=b&skv=2021-08-06&sig=r8vbJh5pWGvAudnYlEsxTbIdDAlXM14GBi9Asa0uRSA%3D"}
]}

Generated Image Preview:

Generating Images from the Python SDK

The OpenAI Python SDK has a create method that you can use to generate images. This method accepts the following parameters:

  • prompt: This is the description of the image you want to generate. The more detailed the description is, the more likely you are to get the result that you want.

  • n: You can request 1-10 images at a time using the n parameter.

  • size: Generated images can have a size of 256x256, 512x512, or 1024x1024 pixels. Smaller sizes are faster to generate.

  • response_format: The format in which the generated images are returned. Must be one of url or b64_json.

create a file name generate_image.py and add the code below:

import os
import openai

openai.api_key = os.getenv('OPENAI_API_KEY') # get the environmental variable
response = openai.Image.create(
prompt="A caricature image of the flash",
n=1,size='512x512')
url = response["data"][0]["url"]
print('Generated Image URL: ', url)

In the code above:

  • You import the os module and the OpenAI Python library

  • You retrieve the OpenAI API Key — from the environment variable you set earlier using the os module.

  • You created an instance of the openai.Image class and call the create method on it, and you pass some of the arguments discussed earlier in the methods which are assigned to the response variable.

  • You retrieve the image URL from the JSON response and assign it to the url variable.

  • Lastly, you print out the URL of the generated image.

When you run the code, an authenticated request will be sent to the OpenAI API endpoint with a payload containing the parameters you provided, this generates an image and returns a response containing the image url you retrieved.

The output of running the code is shown below:

$ (venv): python generate_image.py

Output:

Generated Image URL:  [https://oaidalleapiprodscus.blob.core.windows.net/private/org-9LGdDgtWNf9H3ahrXtMGxLeE/user-ZGMS19ezW8ZGryPVOaroIQkc/img-ogFv9WINX45oJ2s1NaL15Nqy.png?st=2023-02-11T16%3A44%3A26Z&se=2023-02-11T18%3A44%3A26Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2023-02-10T21%3A38%3A38Z&ske=2023-02-11T21%3A38%3A38Z&sks=b&skv=2021-08-06&sig=guLu8Zn1W7l3mmHWmqV3zu2zA5QLbZPp9tCsFzvWlTI%3D](https://oaidalleapiprodscus.blob.core.windows.net/private/org-9LGdDgtWNf9H3ahrXtMGxLeE/user-ZGMS19ezW8ZGryPVOaroIQkc/img-ogFv9WINX45oJ2s1NaL15Nqy.png?st=2023-02-11T16%3A44%3A26Z&se=2023-02-11T18%3A44%3A26Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2023-02-10T21%3A38%3A38Z&ske=2023-02-11T21%3A38%3A38Z&sks=b&skv=2021-08-06&sig=guLu8Zn1W7l3mmHWmqV3zu2zA5QLbZPp9tCsFzvWlTI%3D)

Generated Image Preview:

downloaded variation image 2

You can generate multiple images by changing the value of n argument in the create method to the number of images you want to create.The value of n can be from 1-10:

import os
import openai

openai.api_key = os.getenv('OPENAI_API_KEY')  # get the environmental variable
response = openai.Image.create(
prompt="A fan art image for deadpool",
n=2, size='256x256')
res = response["data"]
urls = [res[x]["url"] for x in range(len(res))]
print('URL: ', urls)

When you run the code, you should have the URL for the two images generated in a list:

$ python generate_image.py

Output:

URL:  ['[https://oaidalleapiprodscus.blob.core.windows.net/private/org-9LGdDgtWNf9H3ahrXtMGxLeE/user-ZGMS19ezW8ZGryPVOaroIQkc/img-4C4V5lOr5t6CkmAhDYWVYQpz.png?st=2023-01-19T07%3A20%3A48Z&se=2023-01-19T09%3A20%3A48Z&sp=r&sv=2021-08-06&sr=](https://oaidalleapiprodscus.blob.core.windows.net/private/org-9LGdDgtWNf9H3ahrXtMGxLeE/user-ZGMS19ezW8ZGryPVOaroIQkc/img-4C4V5lOr5t6CkmAhDYWVYQpz.png?st=2023-01-19T07%3A20%3A48Z&se=2023-01-19T09%3A20%3A48Z&sp=r&sv=2021-08-06&sr=)b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2023-01-19T06%3A53%3A33Z&ske=2023-01-20T06%3A53%3A33Z&sks=b&skv=2021-08-06&sig=qo14FbLYyWYFjjCPdbDsWtpbYtlXQm6oZ1se08qmW70%3D', '[https://oaidalleapiprodscus.blob.core.windows.net/private/org-9LGdDgtWNf9H3ahrXtMGxLeE/user-ZGMS19ezW8ZGryPVOaroIQkc/img-oDSjMTcGKEWVmd6QRzXK6Zfg.png?st=2023-01-19T07%3A20%3A48Z&se=2023-01-19T09%3A20%3A48Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3](https://oaidalleapiprodscus.blob.core.windows.net/private/org-9LGdDgtWNf9H3ahrXtMGxLeE/user-ZGMS19ezW8ZGryPVOaroIQkc/img-oDSjMTcGKEWVmd6QRzXK6Zfg.png?st=2023-01-19T07%3A20%3A48Z&se=2023-01-19T09%3A20%3A48Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3)&skt=2023-01-19T06%3A53%3A33Z&ske=2023-01-20T06%3A53%3A33Z&sks=b&skv=2021-08-06&sig=Ro/cMMvgW6EkwI%2B82fBwod6MBwFYK1J9ST11Df8AV28%3D'\]

Generated Image Preview:

Don’t worry if the image generated is not exactly what you expected, you can try the request as many times as possible.

NB: There is a rate limit of 20 requests per minute for the trial. Therefore, make sure to give a detailed prompt of the image you want so as not to exhaust your request limit.

You can read more about the image creation https://beta.openai.com/docs/guides/images/introduction

Generating Images with the OpenAI REST API using the Python requests Library.

Apart from the Python library package provided by the openAI, you can also interact with the API through HTTP requests from any language.

NB: The OpenAI documentation at the time of writing this article does not provide the direct code for using the image generation endpoint in Python, but the curl approach provided, can give an idea of how to send HTTP requests to the endpoints.

You need to authenticate the request you send to the OpenAI API. The API uses API keys for authentication. Visit this page to get your API key and set it as an environmental variable if you have not, but if you have, you can proceed with the rest of the tutorial.

Remember that your API key is a secret! Do not share it with others or expose it in any client-side code (browsers, apps).

All API requests should include your API key in an Authorization HTTP header as follows:

Authorization: Bearer YOUR_API_KEY

To create images with the OpenAI REST API, use the POST https://api.openai.com/v1/images/generations endpoint, the endpoint creates an image from the request body and it accepts the following parameters:

  • prompt (required string, max 1000 chars)

  • n (optional integer, default 1, 1-10)

  • size (optional string, default 1024x1024, 256x256, 512x512, or 1024x1024)

  • response_format (optional string, default 'url', 'url' or 'b64_json').

Create a new file, generate_image_api.py and add the following code to it:

import os
import openai
import requests

URL = 'https://api.openai.com/v1/images/generations'
headers = {"Content-Type": "application/json",
            "Authorization": f"Bearer {os.getenv('OPENAI_API_KEY')}"}
data = {"prompt": "A pencil drawing of deadpool"}
response = requests.post(URL, json=data, headers=headers)
print(response.json())

After running the code, You should receive a response including the URL to the generated image:

$ python generate_image_api.py

Output:

{'created': 1674232104, 'data': [{'url': 'https://oaidalleapiprodscus.blob.core.windows.net/private/org-9LGdDgtWNf9H3ahrXtMGxLeE/user-ZGMS19ezW8ZGryPVOaroIQkc/img-FnW8K1UxGjjRUpVZ0v4Geg1s.png?st=2023-01-20T15%3A28%3A24Z&se=2023-01-20T17%3A28%3A24Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2023-01-20T12%3A38%3A02Z&ske=2023-01-21T12%3A38%3A02Z&sks=b&skv=2021-08-06&sig=kpnvXM9v//xlnmE0%2BHCAomtdZ12322GLu1qH76FyMwo%3D'}]}

Generated Image Preview:

You can request the base64 encoding of the image by setting the response_format to b64_json in the request payload:

NB: If you set the value of the response_format to be b64_json, the base64 value of the generated image will be returned, which you will need to convert to an image format such as .png.

import os
import openai
import requests

URL = 'https://api.openai.com/v1/images/generations'
headers = {"Content-Type": "application/json",
        "Authorization": f"Bearer {os.getenv('OPENAI_API_KEY')}"}
data = {"prompt": "A pencil drawing of deadpool", "response_format": "b64_json", "size": '256x256'}
response = requests.post(URL, json=data, headers=headers)
res = response.json()["data"][0]["b64_json"]
print(res[:50])
with open('b64.txt', 'w+') as temp:
    temp.write(res)

In the code above:

  • You printed out the 0 – 50th element in the res variable due to long base64 data.

  • You write the long base64 data to a file named b64.txt.

After running the above code successfully, you will have a file b64.txt in your current directory, which contains the base64 encoding of the generated image.

Now that you have your image in base64, you will need to convert it to an image format before you can view the generated image. To do that, you will make use of a website that converts base64 to an image.

Copy the base64 data and paste it into the space provided on this website. This decodes the base64 to an image format.

The decoded image for the data is shown below:

Decoding the Base64 Encoded Data Response with Python

To decode a base64 encoding using Python, you can use the base64 Python library:

The function below is a utility function that decodes a list of base64 encoded data into an image format:

def decode_image(b64_image_data_list: list):
    """This method will loop through the b64_image_data_list,
    decode each encoded data in the list from base64 to png,
    then save the png file locally"""
    for base64 in b64_image_list:
        image_data = b64decode(base64)
        name = str(random.randrange(1, 1000)) + '.png'
        with open(name, 'wb') as png:
            png.write(image_data)
            print(f'{name} successfully decoded')

In the code above, you:

  • Define a function that accepts a list of base64 data.

  • Loop through all the data (as it is expected to be a list), decode it and assign it to a variable image_data.

  • Use the random library to generate a random number, and attach the .png extension, to form the name of the image.

  • Write the decoded data to the file by passing the name, and the mode to be wb for writing binary files.

create a new file called decode_image.py, and add the complete code below:

import random
import requests
from base64 import b64decode
import json
import os

url = 'https://api.openai.com/v1/images/generations'
headers = {"Content-Type": "application/json",
            "Authorization": f"Bearer {os.getenv('OPENAI_API_KEY')}"}
data = {
"prompt": "A pencil drawing of deadpool drawn like leonardo davinci",
"response_format": "b64_json", "size": '256x256', "n": 1}

response = requests.post(url, json=data, headers=headers)
res = response.json()["data"]  # [0]["url"]
urls = [res[x]["b64_json"] for x in range(len(res))]  # list of all images base64 encoding

def decode_image(b64_image_data_list: list):
    """This method will loop through the b64_image_data_list,
    decode each encoded data in the list from base64 to png,
    then save the png file locally"""
    for base64 in b64_image_data_list:
        image_data = b64decode(base64)    # decode the base64
        name = str(random.randrange(1, 1000)) + '.png'
        with open(name, 'wb') as png:
            png.write(image_data)  # save the image
            print(f'{name} successfully decoded')

decode_image(urls)
print('--------done decoding----------')

With this, you can successfully decode any base64 encoded image data returned as a response to your request.

When you run the above code, you should get a response similar to:

$ (venv): python decode_image.py

Output:

215.png successfully decoded and saved
--------done decoding----------

Generated Image Preview:

Downloading Generate Image locally using the urllib Python package

To download the images generated, write a utility function called download_image, which accepts a list of URL(s) as an argument, and then uses the urllib package to download them locally.

For this, you will create a file name generate_download_image.py, and add the code below into the file:

import os
import requests
import urllib.request
import random

url = 'https://api.openai.com/v1/images/generations'
headers = {"Content-Type": "application/json",
            "Authorization": f"Bearer {os.getenv('OPENAI_API_KEY')}"}
data = {"prompt": "A pencil drawing of deadpool",
        "size": '256x256', "n": 3}
response = requests.post(url, json=data, headers=headers)
res = response.json()["data"]  # [0]["url"]
urls = [res[x]["url"] for x in range(len(res))]

def download_image(url_list: list):
    """this method will loop through the url_list,
    a list,containing URLS
    download the image from the URL, and save it locally"""

    for url in url_list:
        name = random.randrange(1, 100)
        full_name = str(name) + '.png'
        urllib.request.urlretrieve(url, full_name)
        print(f'image {full_name} download successfully...')

download_image(urls)

The download_image function accepts a url_list parameter (which is expected to be a list), loops through the list, creates the name for the image you are about to download using the random package and then appends the .png extension to it.

The function downloads the image using the urlretrieve method from the urllib.request module which accepts the URL to the image, and the name of the image as an argument.

Run the code, and the images should be downloaded in your current directory:

$ (venv): python generate_image.py

Output:

image 85.png download successfully...
image 29.png download successfully...
image 51.png download successfully...

Check your working directory, and you should have some images with the file name printed on the shell above.

Generated Image Preview:

Creating Different Variations of the Generated Image

To create different variations of a local image, use a valid PNG file under 4MB that is square as the basis.

These variations are an alternate form of the image that you use as a base image.

create a new file called image_variations.py, and enter the code below:

import random
import urllib.request
import os
import openai

openai.api_key = os.getenv("OPENAI_API_KEY")
res = openai.Image.create_variation(
        image=open("838.png", "rb"), n=2, size="512x512",
        response_format="url")
resp = res["data"]
resp_list = [resp[x]["url"] for x in range(len(resp))]

def download_image(url_list: list):
    """this method will loop through the url_list,
    a list,containing URLS
    download the image from the URL, and save it locally"""
    for url in url_list:
        name = random.randrange(1, 100)
        full_name = str(name) + '-variations.png'
        urllib.request.urlretrieve(url, full_name)
        print(f'image {full_name} download successfully...')

download_image(resp_list)

In the first part of the code:

  • You passed in image as an argument with the value of the image you want to generate the variations for.

  • You added the number of variations you want to generate, as well as the size of the image variations

  • You set the response_format to URL.

Next, You make use of the previous download_image function to download the images locally.

After running the code, you should have something similar to:

$ (venv): python image_variations.py

Output:

image 23-variations.png download successfully...

image 60-variations.png download successfully...

Generated Image Preview:

Conclusions

Exploring Python and the OpenAI API for generating images with DALL·E 2 is an exciting and innovative way for developers to create images with great accuracy and speed.

In this article, you learned different ways of generating images using the openAI package, how to download the generated images using the urllib Python package, the different response formats in which you can request an image, and how to decode image data gotten in b64_json response format into an image format such as .png.

The OpenAI API allows developers to create images with a wide range of customization options and parameters. The possibilities are endless with the OpenAI API and Python, so why not explore the possibilities today?

You can access all the code in this Github Repository

Attributions