모델/Stable Cascade - base

Stable Cascade - base

|
12/26/2025
|
11:17:54 PM
| Discussion|
0
A fantasy portrait of a woman with honey-blonde hair and emerald eyes, looking upwards with a tear on her cheek, wearing ornate silver earrings and a necklace, illuminated by dramatic chiaroscuro lighting.
Four animated Neo-Victorian heroines in a sunlit attic conservatory, wielding magical sparks, a sword, and an open glowing grimoire, with a steampunk city visible through large arched windows.

추천 매개변수

steps

10 - 30

resolution

1024x1024

Use the 3.6 billion parameter version of Stage C for best results due to extensive finetuning.

Choose the larger variant (1.5 billion params) of Stage B to better reconstruct small and fine image details.

The model supports popular Stable Diffusion extensions such as LoRA, ControlNet, IP-Adapter, and LCM.

Install the 'diffusers' Python package from the specified branch for code compatibility.

크리에이터 스폰서

Demos:

Stable Cascade

This model is built upon the Würstchen architecture and its main

difference to other models like Stable Diffusion is that it is working at a much smaller latent space. Why is this

important? The smaller the latent space, the faster you can run inference and the cheaper the training becomes.

How small is the latent space? Stable Diffusion uses a compression factor of 8, resulting in a 1024x1024 image being

encoded to 128x128. Stable Cascade achieves a compression factor of 42, meaning that it is possible to encode a

1024x1024 image to 24x24, while maintaining crisp reconstructions. The text-conditional model is then trained in the

highly compressed latent space. Previous versions of this architecture, achieved a 16x cost reduction over Stable

Diffusion 1.5. <br> <br>

Therefore, this kind of model is well suited for usages where efficiency is important. Furthermore, all known extensions

like finetuning, LoRA, ControlNet, IP-Adapter, LCM etc. are possible with this method as well.

Model Details

Model Description

Stable Cascade is a diffusion model trained to generate images given a text prompt.

  • Developed by: Stability AI

  • Funded by: Stability AI

  • Model type: Generative text-to-image model

Model Sources

For research purposes, we recommend our StableCascade Github repository (https://github.com/Stability-AI/StableCascade).

Model Overview

Stable Cascade consists of three models: Stage A, Stage B and Stage C, representing a cascade to generate images,

hence the name "Stable Cascade".

Stage A & B are used to compress images, similar to what the job of the VAE is in Stable Diffusion.

However, with this setup, a much higher compression of images can be achieved. While the Stable Diffusion models use a

spatial compression factor of 8, encoding an image with resolution of 1024 x 1024 to 128 x 128, Stable Cascade achieves

a compression factor of 42. This encodes a 1024 x 1024 image to 24 x 24, while being able to accurately decode the

image. This comes with the great benefit of cheaper training and inference. Furthermore, Stage C is responsible

for generating the small 24 x 24 latents given a text prompt. The following picture shows this visually.

For this release, we are providing two checkpoints for Stage C, two for Stage B and one for Stage A. Stage C comes with

a 1 billion and 3.6 billion parameter version, but we highly recommend using the 3.6 billion version, as most work was

put into its finetuning. The two versions for Stage B amount to 700 million and 1.5 billion parameters. Both achieve

great results, however the 1.5 billion excels at reconstructing small and fine details. Therefore, you will achieve the

best results if you use the larger variant of each. Lastly, Stage A contains 20 million parameters and is fixed due to

its small size.

Evaluation

According to our evaluation, Stable Cascade performs best in both prompt alignment and aesthetic quality in almost all

comparisons. The above picture shows the results from a human evaluation using a mix of parti-prompts (link) and aesthetic prompts. Specifically, Stable Cascade (30 inference steps) was compared against Playground v2 (50 inference

steps), SDXL (50 inference steps), SDXL Turbo (1 inference step) and Würstchen v2 (30 inference steps).

Code Example

⚠️ Important: For the code below to work, you have to install diffusers from this branch while the PR is WIP.

pip install git+https://github.com/kashif/diffusers.git@wuerstchen-v3

import torch

from diffusers import StableCascadeDecoderPipeline, StableCascadePriorPipeline

device = "cuda"

num_images_per_prompt = 2

prior = StableCascadePriorPipeline.from_pretrained("stabilityai/stable-cascade-prior", torch_dtype=torch.bfloat16).to(device)

decoder = StableCascadeDecoderPipeline.from_pretrained("stabilityai/stable-cascade", torch_dtype=torch.float16).to(device)

prompt = "Anthropomorphic cat dressed as a pilot"

negative_prompt = ""

prior_output = prior(

prompt=prompt,

height=1024,

width=1024,

negative_prompt=negative_prompt,

guidance_scale=4.0,

num_images_per_prompt=num_images_per_prompt,

num_inference_steps=20

)

decoder_output = decoder(

image_embeddings=prior_output.image_embeddings.half(),

prompt=prompt,

negative_prompt=negative_prompt,

guidance_scale=0.0,

output_type="pil",

num_inference_steps=10

).images

#Now decoder_output is a list with your PIL images

Uses

Direct Use

The model is intended for research purposes for now. Possible research areas and tasks include

  • Research on generative models.

  • Safe deployment of models which have the potential to generate harmful content.

  • Probing and understanding the limitations and biases of generative models.

  • Generation of artworks and use in design and other artistic processes.

  • Applications in educational or creative tools.

Excluded uses are described below.

Out-of-Scope Use

The model was not trained to be factual or true representations of people or events,

and therefore using the model to generate such content is out-of-scope for the abilities of this model.

The model should not be used in any way that violates Stability AI's Acceptable Use Policy.

Limitations and Bias

Limitations

  • Faces and people in general may not be generated properly.

  • The autoencoding part of the model is lossy.

Recommendations

The model is intended for research purposes only.

How to Get Started with the Model

Check out https://github.com/Stability-AI/StableCascade

이전
✨ LazyPositive LazyNegative for ALL illustrious NoobAI Pony SDXL model Lazy Embeddings (Positive and Negative plus more!) - lazydn Pony
다음
SilverjowStyle_FLUX - V1

모델 세부사항

모델 유형

Checkpoint

기본 모델

Stable Cascade

모델 버전

base

모델 해시

0d28c8562d

제작자

토론

댓글을 남기려면 log in하세요.

Stable Cascade - base 제작 이미지

A fantasy portrait of a woman with honey-blonde hair and emerald eyes, looking upwards with a tear on her cheek, wearing ornate silver earrings and a necklace, illuminated by dramatic chiaroscuro lighting.
Four animated Neo-Victorian heroines in a sunlit attic conservatory, wielding magical sparks, a sword, and an open glowing grimoire, with a steampunk city visible through large arched windows.

anime 이미지

A surreal scene showing a woman in a flowing white dress dancing barefoot on an oversized piano keyboard, surrounded by swirling sheet music in an urban canyon under a bright blue sky.
A young woman in a cream-colored dress gently strokes the muzzle of a large white horse with dappled fur, both sharing a peaceful connection in soft, muted lighting against a hazy sky.
Goth girl with black bob cut and feather skull hair ornament holding a grumpy Siamese cat, seen through a window, wearing a lace-trimmed frilled dress.
A tranquil Scandinavian lake reflecting the vibrant orange sunset sky, surrounded by dense pine forests and distant mountains in anime style.
A vast Scandinavian desert scene with soft rolling dunes under an orange gradient sunset sky, featuring birds flying high in the sky, viewed from overhead in an anime style.
A vivid red flower surrounded by swirling black, gold, and white abstract cosmic elements with constellations and light particles on a white background.
Anime girl sitting with a white dress, white pantyhose, dark blue frilly socks, a white hood with cat ears, and oversize high platform shoes.
A woman meditates peacefully at the edge of a river in a dense jungle surrounded by big green plants, vivid orange flowers, and flying paradise birds under soft, ambient lighting.
A cyberpunk woman with a robotic arm carefully trimming a bonsai tree inside a traditional Japanese room with wood floors and sliding doors, rendered in semi-realistic anime style.
Close-up digital illustration of a young woman with pink hair and heterochromia eyes wearing a white and gold futuristic helmet against a solid pink background.

art 이미지

Watercolor portrait of a person with half black and half white hair, closed eyes, against a crimson, black, and golden ink splattered background.
A semi-realistic watercolor painting of a white-haired elf woman with pointed ears, wearing a white crop top, white pants, and a flowing cape. She is surrounded by flying doves against a soft, pastel sky background.
A closeup abstract portrait of a face featuring expressive brush strokes, tricolor ink splashes in orange, blue, and black with high contrast and emotional intensity.
A vibrant digital illustration of a mandrill’s head with bold red face, yellow eyes, pink nose and intricate black, white, and red fur patterns on a solid black background.
Sketch art of a large red and black dragon emerging in the rain at night facing a lone figure.
A futuristic cityscape with tall skyscrapers shrouded in fog and glowing orange and blue lights, featuring a large circular floating structure above.
A detailed portrait of a military commander blending Napoleonic era style with cyberpunk elements, dressed in luxurious black and gold uniform, in a smoky urban background.
Detailed portrait of an elven hemomancer woman wearing scarlet robes and hood, with glowing red eyes and intricate golden embroidery, standing in a dark forest cave.
Confident woman wearing black leather bralette and wide-legged pants with bright green platform heels standing against a green wall with dramatic striped shadows.
Minimalist flat vector artwork showing a slender silhouette of a woman walking on a beach at Cap Canaille, Southern France, with a vivid Cote d'Azur seascape and large expressive clouds in the background.

기본 모델 이미지

Photorealistic scene of undead characters including zombies and skeletons walking through a spooky Halloween cemetery filled with glowing jack-o'-lantern pumpkins and desiccated trees in a dark, foggy atmosphere.

logo 이미지

Close-up of a big squid man with orange eyes and striped shirt working as a cashier behind a shop counter filled with instant ramen cups and snacks.
Digital art of D.Va from Overwatch kneeling in a detailed blue and white bodysuit with a handgun, set against a pink and white background with a large close-up of her face.
Black and white graphic of an anime girl standing beside a detailed cyberpunk-style motorcycle, designed as a bold logo.
Steampunk style coffee machine with smiling girl, watercolor sketch.
Pixel art of chibi Shiroko from Blue Archive with a sword on an isometric grid.
A detailed still life with various fruits and lit candles, AI generated using Stable Diffusion.
Vintage style illustration of a muscular man with long flowing hair in a heroic pose surrounded by mystical symbols and ghostly hands, AI generated using stable diffusion.

realism 이미지

Cyberpunk female cyborg with white hair and blue eyes wearing red armored plating, posed against a white background with a large blue wolf shadow behind her.
Watercolor portrait of a woman with smooth green skin wearing a textured leaf cloak and hood, facing left with upper face in shadow and a dark green background.
Realistic digital painting of a female warrior with flowing silver hair wearing intricate gold-toned metal armor and a gas mask, kneeling on a stone with dramatic lighting.
A tiny white rabbit with floppy ears sitting in a woven basket surrounded by fresh orange carrots, painted in soft pastel colors with detailed fluffy fur and gentle morning light.
Charcoal drawing of a contemplative artist sitting cross-legged sketching in a notebook with pencil, an inkwell sits nearby on textured paper background.
Closeup oil painting portrait of an ethereal teen elf girl with long white hair and bright blue eyes, wearing a black corseted futuristic dress inside a dark space ship, lit with Rembrandt lighting.
A watercolor portrait of a young blonde woman wearing large black-framed sunglasses reflecting amber tones, with tousled hair and a dark jacket over a beige background.
A serene night mountain landscape featuring a large glowing full moon over rugged, snow-patched peaks, with a lush green valley and pink-purple blooming flowers in the foreground.
Brown tabby cat wearing leather harness, curiously perched in a woven wicker hot air balloon basket under a crimson balloon against a blue sky.