modèles/SPO-SDXL_4k-p_10ep_LoRA_webui - v1.0

SPO-SDXL_4k-p_10ep_LoRA_webui - v1.0

|
7/15/2025
|
11:13:45 AM
| Discussion|
0
Grumpy white duck with an orange beak holding a black nameplate standing in front of a height chart under dramatic spotlight and shadow lighting.
A grumpy broccoli character with a broccoli head and wet shiny skin standing in a meadow under heavy rain with dark rain clouds and volumetric lighting.
A detailed anime girl with flowing multicolored blue and black hair wearing a black lace dress and a crown, surrounded by blooming flowers in an ornately decorated indoor garden with warm volumetric lighting.
A cyborg geisha demon crouching amid skulls and bones, wearing a futuristic helmet with glowing red visor, bloody red armor with neon trim, and a golden cape, set in a mysterious castle garden.
Detailed illustration of an angel with grey hair, one wing, and a red halo, displaying a menacing aura while leaning over a book with yellow eyes and sharp fangs.
Close-up portrait of a gaunt figure with long messy hair covering dark black eyes, a wild sinister smile, blood around the mouth, and a spiked collar.
Close-up portrait of an anime girl with short brown hair, brown eyes, freckles, and fairy wings, wearing a green dress with light particles in the background.
Impressionist style painting of a white wolf in blue tones looking upward with a large orange wolf silhouette in the background, showing striking color contrast and glow effects.
Petite blonde girl with short hair and circle glasses in a yellow hoodie and striped socks sitting on the floor reading a book in a cozy bedroom with plants and bookshelves.
Female singer with purple eyes and dark ponytail in a black gothic cocktail dress passionately singing into a vintage microphone on a dimly lit jazz club stage with spotlights and musical instruments in the background.
Blonde woman in pink military uniform and red boots in a dynamic fighting stance holding a gun inside a retro-futuristic spacecraft corridor.
Anime-style girl wearing a blue jacket and red plaid skirt firing an AR-15 rifle indoors near broken windows, carrying black duffle bags with money visible, wearing blue gloves and striped bowtie.

Conseils

SPO allows the model to focus on subtle fine-grained visual differences without layout distractions by using step-by-step preference optimization.

The model was fine-tuned on 4,000 prompts for 10 epochs as a LoRA checkpoint.

SPO converges faster than existing DPO methods while maintaining image-text alignment.

Aesthetic Post-Training Diffusion Models from Generic Preferences with Step-by-step Preference

Arxiv Paper

Github Code

Project Page

Abstract

Generating visually appealing images is fundamental to modern text-to-image generation models. A potential solution to better aesthetics is direct preference optimization (DPO), which has been applied to diffusion models to improve general image quality including prompt alignment and aesthetics. Popular DPO methods propagate preference labels from clean image pairs to all the intermediate steps along the two generation trajectories. However, preference labels provided in existing datasets are blended with layout and aesthetic opinions, which would disagree with aesthetic preference. Even if aesthetic labels were provided (at substantial cost), it would be hard for the two-trajectory methods to capture nuanced visual differences at different steps.

To improve aesthetics economically, this paper uses existing generic preference data and introduces step-by-step preference optimization (SPO) that discards the propagation strategy and allows fine-grained image details to be assessed. Specifically, at each denoising step, we 1) sample a pool of candidates by denoising from a shared noise latent, 2) use a step-aware preference model to find a suitable win-lose pair to supervise the diffusion model, and 3) randomly select one from the pool to initialize the next denoising step. This strategy ensures that diffusion models focus on the subtle, fine-grained visual differences instead of layout aspect. We find that aesthetic can be significantly enhanced by accumulating these improved minor differences.

When fine-tuning Stable Diffusion v1.5 and SDXL, SPO yields significant improvements in aesthetics compared with existing DPO methods while not sacrificing image-text alignment compared with vanilla models. Moreover, SPO converges much faster than DPO methods due to the step-by-step alignment of fine-grained visual details. Code and model: https://rockeycoss.github.io/spo.github.io/

Model Description

This model is fine-tuned from stable-diffusion-xl-base-1.0. It has been trained on 4,000 prompts for 10 epochs. This checkpoint is a LoRA checkpoint. For more information, please visit here

Citation

If you find our work useful, please consider giving us a star and citing our work.

@article{liang2024step,
  title={Aesthetic Post-Training Diffusion Models from Generic Preferences with Step-by-step Preference Optimization},
  author={Liang, Zhanhao and Yuan, Yuhui and Gu, Shuyang and Chen, Bohan and Hang, Tiankai and Cheng, Mingxi and Li, Ji and Zheng, Liang},
  journal={arXiv preprint arXiv:2406.04314},
  year={2024}
}

Contributeur

Précédent
HassaKu | Shiiro's Styles - v1.0
Suivant
Vivid Midjourney mimic - XL - flux v1.0

Détails du modèle

Type de modèle

LORA

Modèle de base

SDXL 1.0

Version du modèle

v1.0

Hash du modèle

b6c2c16f3e

Créateur

Discussion

Veuillez vous log in pour laisser un commentaire.

Collection de modèles - SPO-SDXL_4k-p_10ep_LoRA_webui

Images par SPO-SDXL_4k-p_10ep_LoRA_webui - v1.0

Images avec modèle de base