Skip to main content

Stable Diffusion 1.5 with private volume

In the quickstart, you created a simple application. In this example, we use a private volume to store the Stable Diffusion 1.5 model files and implement an AIGC(AI generated content) online text-to-image and image-to-image service.

Login EVERAI CLI

Create a directory for your app firstly. In your app directory, you should login by token you got in EverAI.

everai login --token <your token>

Create secrets

This step is optional, if you have already created a secret for the registry you need to access, or if your registry is publicly accessible.

In this case, we will create one secret for quay.io.

everai secret create your-quay-io-secret-name \
--from-literal username=<your username> \
--from-literal password=<your password>

NOTE

quay.io is a well-known public image registry. Well-known image registry similar to quay.io include Docker Hub, GitHub Container Registry, Google Container Registry, etc.

You should create a secret to access Hugging Face as well.

everai secret create your-huggingface-secret-name \
--from-literal token-key-as-your-wish=<your huggingface token>

Create configmap

Optional, but you can use configmap for adjust autoscaling policy after deploying the image.

everai configmap create sd15-configmap \
--from-literal min_workers=1 \
--from-literal max_workers=5 \
--from-literal min_free_workers=1 \
--from-literal scale_up_step=1 \
--from-literal max_idle_time=60

Write your app code in python

Basic setup

There is an example code in app.py.

First, import the required EverAI Python class library. Then define the variable names that need to be used, including the volume, the secret that accesses the image registry, and the file stored in the volume. Use the Image.from_registry static method to create a image instance. Create and define an app instance through the App class.

What needs to be noted here is that you need to configure GPU resources for your application. The GPU model configured here is "A100 40G", and the number of GPU cards is 1.

from everai.app import App, context, VolumeRequest
from everai_autoscaler.builtin import FreeWorkerAutoScaler
from everai.image import Image, BasicAuth
from everai.resource_requests import ResourceRequests, CPUConstraints
from everai.placeholder import Placeholder
from image_builder import IMAGE

APP_NAME = '<your app name>'
VOLUME_NAME = 'models--runwayml--stable-diffusion-v1-5'
QUAY_IO_SECRET_NAME = 'your-quay-io-secret-name'
MODEL_NAME = 'runwayml/stable-diffusion-v1-5'
HUGGINGFACE_SECRET_NAME = 'your-huggingface-secret-name'
CONFIGMAP_NAME = 'sd15-configmap'

image = Image.from_registry(IMAGE, auth=BasicAuth(
username=Placeholder(QUAY_IO_SECRET_NAME, 'username', kind='Secret'),
password=Placeholder(QUAY_IO_SECRET_NAME, 'password', kind='Secret'),
))

app = App(
APP_NAME,
image=image,
volume_requests=[
VolumeRequest(name=VOLUME_NAME),
],
secret_requests=[
HUGGINGFACE_SECRET_NAME,
QUAY_IO_SECRET_NAME
],
configmap_requests=[CONFIGMAP_NAME],
autoscaler=FreeWorkerAutoScaler(
# keep running workers even no any requests, that make reaction immediately for new request
min_workers=Placeholder(kind='ConfigMap', name=CONFIGMAP_NAME, key='min_workers'),
# the maximum works setting, protect your application avoid to pay a lot of money
# when an attack or sudden traffic
max_workers=Placeholder(kind='ConfigMap', name=CONFIGMAP_NAME, key='max_workers'),
# this factor controls autoscaler how to scale up your app
min_free_workers=Placeholder(kind='ConfigMap', name=CONFIGMAP_NAME, key='min_free_workers'),
# this factor controls autoscaler how to scale down your app
max_idle_time=Placeholder(kind='ConfigMap', name=CONFIGMAP_NAME, key='max_idle_time'),
# this factor controls autoscaler how many steps to scale up your app from queue
scale_up_step=Placeholder(kind='ConfigMap', name=CONFIGMAP_NAME, key='scale_up_step'),
),
resource_requests=ResourceRequests(
cpu_num=2,
memory_mb=20480,
gpu_num=1,
gpu_constraints=[
"A100 40G"
],
cpu_constraints=CPUConstraints(
platforms=['amd64', 'arm64']
),
cuda_version_constraints=">=12.4"
),
)

Load model

If your local environment does not have a model file, you can use the StableDiffusionPipeline.from_pretrained method to pass in MODEL_NAME and pull the model file from the Hugging Face official website. And by setting cache_dir, the model file will be cached in the private volume models--runwayml--stable-diffusion-v1-5.

You can get the local path of the volume models--runwayml--stable-diffusion-v1-5 through the everai volume get command. After entering the local path of the volume, you can see the model files that have been cached.

everai volume get models--runwayml--stable-diffusion-v1-5
<Volume: id: Xo6zoFc4986CrD7dYuNrwr, name: models--runwayml--stable-diffusion-v1-5, revision: 000001-12d, files: 76, size: 10.21 GiB>
path: /root/.cache/everai/volumes/Xo6zoFc4986CrD7dYuNrwr

When using everai app run to debug the sample code in the local environment, the value of is_prepare_mode is False, and the operation of pushing local files to the cloud will not be performed. If your local debugging environment has GPU resources(Nvidia GPU or GPU for MacOS devices with Metal programming framework), running everai app run will be successful.

After your code is debugged, execute the everai app prepare command. This command will execute all methods annotated by @app.prepare. At this time, the value of is_prepare_mode is True. In the sample code, the model files in the local volume models--runwayml--stable-diffusion-v1-5 will be pushed to the cloud when this command is executed.

from diffusers import StableDiffusionPipeline, StableDiffusionImg2ImgPipeline
import torch

txt2img_pipe = None
img2img_pipe = None

@app.prepare()
def prepare_model():
volume = context.get_volume(VOLUME_NAME)
assert volume is not None and volume.ready

secret = context.get_secret(HUGGINGFACE_SECRET_NAME)
assert secret is not None
huggingface_token = secret.get('token-key-as-your-wish')

model_dir = volume.path
is_in_cloud = context.is_in_cloud

global txt2img_pipe
global img2img_pipe

txt2img_pipe = StableDiffusionPipeline.from_pretrained(MODEL_NAME,
token=huggingface_token,
cache_dir=model_dir,
variant='fp16',
torch_dtype=torch.float16,
low_cpu_mem_usage=False,
local_files_only=is_in_cloud
)

# The self.components property can be useful to run different pipelines with the same weights and configurations without reallocating additional memory.
# If you just want to use img2img pipeline, you should use StableDiffusionImg2ImgPipeline.from_pretrained below.
img2img_pipe = StableDiffusionImg2ImgPipeline(**txt2img_pipe.components)
#img2img_pipe = StableDiffusionImg2ImgPipeline.from_pretrained(MODEL_NAME,
# token=huggingface_token,
# cache_dir=model_dir,
# variant='fp16',
# torch_dtype=torch.float16,
# low_cpu_mem_usage=False,
# local_files_only=is_in_cloud
# )

# only in prepare mode push volume
# to save gpu time (redundant sha256 checks)
if context.is_prepare_mode:
context.volume_manager.push(VOLUME_NAME)
else:
if torch.cuda.is_available():
txt2img_pipe.to("cuda")
img2img_pipe.to("cuda")
elif torch.backends.mps.is_available():
mps_device = torch.device("mps")
txt2img_pipe.to(mps_device)
img2img_pipe.to(mps_device)

Generate inference service

Text generates image

Aftering loading Stable Diffusion 1.5 model, now you can write your Python code that uses flask to implement the inference online text-to-image service of AIGC(AI generated content).

import flask
from flask import Response

import io

# service entrypoint
# api service url looks https://everai.expvent.com/api/routes/v1/default/stable-diffusion-v1-5/txt2img
# for test local url is http://127.0.0.1:80/txt2img
@app.service.route('/txt2img', methods=['GET','POST'])
def txt2img():
if flask.request.method == 'POST':
data = flask.request.json
prompt = data['prompt']
else:
prompt = flask.request.args["prompt"]

pipe_out = txt2img_pipe(prompt)

image_obj = pipe_out.images[0]

byte_stream = io.BytesIO()
image_obj.save(byte_stream, format="PNG")

return Response(byte_stream.getvalue(), mimetype="image/png")

Image generates image

Now you can write your Python code that uses flask to implement the inference online image-to-image service of AIGC(AI generated content).

import PIL
from io import BytesIO

# service entrypoint
# api service url looks https://everai.expvent.com/api/routes/v1/default/stable-diffusion-v1-5/img2img
# for test local url is http://127.0.0.1/img2img
@app.service.route('/img2img', methods=['POST'])
def img2img():
f = flask.request.files['file']
img = f.read()

prompt = flask.request.form['prompt']

init_image = PIL.Image.open(BytesIO(img)).convert("RGB")
init_image = init_image.resize((768, 512))

pipe_out = img2img_pipe(prompt=prompt, image=init_image, strength=0.75, guidance_scale=7.5)

image_obj = pipe_out.images[0]

byte_stream = io.BytesIO()
image_obj.save(byte_stream, format="JPEG")

return Response(byte_stream.getvalue(), mimetype="image/jpg")

Build image

This step will build the container image, using two very simple files Dockerfile and image_builder.py.

There is an example code in image_builder.py.

In image_builder.py, you should set your image repo.

In this example, we choose to use quay.io as the public image registry to store application images. You can also use well-known image registry similar to quay.io, such as Docker Hub, GitHub Container Registry, Google Container Registry, etc. If you have a self-built image registry and the image can be accessed on the Internet, you can also use it.

from everai.image import Builder

IMAGE = 'quay.io/<username>/<repo>:<tag>'

The dependence of this step is docker and buildx installed on your machine. Otherwise we will have further prompts to help you install them.

docker login quay.io
docker buildx version

Then call the following command will compile the image and push them to your specified registry.

everai image build

Deploy image

The final step is to deploy your app to everai and keep it running.

everai app create

After running everai app list, you can see the result similar to the following. Note that CREATED_AT uses UTC time display.

If your app's status is DEPLOYED, and the number of ready worker containers is equal to the expected number of worker containers, which is 1/1, it means that your app is deployed successfully.

NAME                   NAMESPACE    STATUS    WORKERS    CREATED_AT
--------------------- ----------- -------- --------- ------------------------
stable-diffusion-v1-5 default DEPLOYED 1/1 2024-06-19T05:16:05+0000

When your app is deployed, you can use curl to execute the following request to test your deployed text-to-image code, A picture generated by the Stable Diffusion 1.5 model will be downloaded in the current directory on the console.

curl -X POST -d '{"prompt": "a photo of a cat on the boat"}' -H 'Content-Type: application/json' -H'Authorization: Bearer <your_token>' -o test.png https://everai.expvent.com/api/routes/v1/<your namespace>/<your app name>/txt2img

Open the picture and you can see the following effect.

You also can use curl to execute the following request to test your deployed image-to-image code.

Before using curl to request, you need to download sketch-mountains-input.jpg to your local directory, execute curl in the directory of the console terminal, and a new image based on this original image of sketch-mountains-input.jpg and prompt will be generated by the large model Stable Diffusion 1.5.

curl -X POST -F 'file=@sketch-mountains-input.jpg' -F "prompt=A fantasy landscape, trending on artstation" -H'Authorization: Bearer <your_token>' -o test.jpg https://everai.expvent.com/api/routes/v1/<your namespace>/<your app name>/img2img

An example of the original image is shown below.

Open the new picture and you can see the following effect.