Features
Features:

Product Tour >

Edraw AI >

Paid Plans:

Individuals >

Business >

Eduaction >
Resources
Blog

History

How-tos & Tips

Discovery

Biography

Business Analysis

Examples

AI concept Map

Free AI Mind Map Generator

Onenote Mind Map

Bcg Matrix Examples

Nike Marketing Strategy

Unilever SWOT Analysis

Make Mind Maps in Google Docs

Guide

FAQs

What's New

Resource Center
Templates
All Templates

Brain Storming Templates

Strategy and Planning Templates

Project Management Templates

Product Management Templates

Human Resources Templates

Agile Workflow Templates

Marketing Templates

Education Templates

Fun and Games Templates

User Gallery
Download
Pricing
Enterprise

EdrawMind

Stable Diffusion

Detailed explanation of Stable Diffusion, which introduces the installation and deployment of model/lora/VAE/plug-ins/embeddings, the interface parameters and basic usage of Vincent diagrams.

Edited at 2024-04-08 21:25:40

PlotWizard

Recent works View more works>>

Stable Diffusion

PlotWizard

Recent works View more works>>

Recommended to you
Outline

Stable Diffusion

1. Installation and deployment of model/lora/VAE/plug-in/embeddings

Three ways to install extensions

1. Go to the extension page, click Available to load the extension list, remember to uncheck [Include advertising, language pack, installed] to see the plug-in list

Here we take the installation of the 3D Openpose editor as an example. Since there are too many plug-ins, we can use the Ctrl F web search function and enter openpose to quickly search for the corresponding plug-ins, and then click Install later.

2. Find the URL of the plug-in. Each one is different. Install from the URL and copy the link to install.

This method requires knowing the github address of the plug-in

The above two installation methods are recommended. You need to turn on the magic to install successfully. However, it is unstable and may cause installation errors and unsuccessful attempts.

The advantage is that you can update the plug-in directly from Extension-Check for Updates

You can also update directly from the Autumn Leaves Launcher

3. If the above methods fail or the plug-in does not display, then manually install it to the plug-in path. Let’s take the installation of the Controlnet plug-in as an example. Open the URL of GitHub where the Contrglnet plug-in is located: https://qithub.com/lllyasviel/ControlNet-v1- 1-nightly

After downloading, unzip it and put it into the ovelai-webui\extensions extension folder to restart webUI and you will find that the plug-in is installed.

Disadvantage: After updating, you must manually put the updated folder into the plug-in directory and the Akiye package will automatically update.

After installing the plug-in, you must reload weib ui. If it is not displayed, try turning off the launcher and re-entering it.

Associations between large models, lora, VAE, plug-ins, and embeddings

Large model: plates, many types

stable diffusion\models\Stable-diffusion

Also called bottom model and main model. The model that has the greatest impact on performance.

Real person/product/two-dimensional

The volume is relatively large, usually several G

Lora: rich food

stable diffusion\models\Lora

Simply put, by mounting Lora, you can specify the characteristics of the character or style to be generated.

Hanfu/ink style/three views/blind box

Volume is about 100M

vae: seasoning to make food delicious

stable diffusion\models|VAE

VAE can be simply understood as a color profile or picture filter. Without VAE, the picture will be gray.

Nowadays, many large models have built-in VAE. Some of them do not have it and need to be used. 84000 is commonly used and the default is generally unchanged.

Plug-in: Chopsticks, forks, let us eat better

stable diffusion\extensions

Such as translation plug-in, and ControlNet

embeddings: ready-made cooking packages

stable diffusion\embeddings

In fact, it means prompt word packaging, which is often used to avoid collapse of human body structure, painting style, spatial structure, etc. If there is no embedding, if you want to avoid the collapse of the painting style, you may need to say dozens of keywords, but now with good embedding, you only need to enter a prompt word to generate a good picture.

Model URL

Need magic

Official model website C: https://civitai.com/

Hugging Face: ttps://huggingface.co/models?other=stable-diffusion

No magic required

LibuLibuai: http://www.liblibai.com/#/

Alchemy Pavilion: http://www.liandange.com/models

Autumn Leaf Launcher, no preview image, not very realistic

2. Interface parameters and basic usage of Vincent diagram

Interface parameters

Large model selection: Select the model (base model) to be used. This is the factor that has the greatest impact on the generated results, mainly reflected in the picture style.

Vae: simply understood as a filter, the default is 84000

Adjust the number of layers: The smaller the clip layer value, the closer the description will be to the descriptor. The larger the clip, the higher the degree of freedom. The default value is 2, no need to change it

Prompt word input

positive cue words

Image quality category: masterpiece, best quality, highres, highly detailed, Masterpiece, best quality, high definition, high detail

Subject: a girl, a boy, a dog, a house

Attributes: long blond hair, blue eyes, fat, thin, earrings, wearing a windbreaker, wearing a skirt, modern style, baroque, Chinese style

Background: hospital, school, apartment, street, transparent background, gradient background

Painting style: realistic style, illustration style, monochrome, comic, retro. Shots: full-length portrait, half-life portrait, selfie mirror, frontal face, looking at the audience, facing the camera

Others: winter, snow, rain, warm colors, green-orange colors

reverse prompt word

If you do not input it, the quality of the output will not be high and the painting style will easily collapse. You can set it as a fixed template.

NSFW, nude, naked, porn, (worst quality, low quality:1.4), deformediris, deformed pupils, (deformed, distorted, disfigured:1.3), croppedout of frame, poorly drawn, bad anatomy, wrong anatomy, extra limb, missing limb, floating limbs, cloned face, (mutated hands andfingers:1.4), disconnected limhs, extra legs, fused fingers, too manyfingers, long neck, mutation, mutated, ugly, disgusting, amputation,blurry, jpeg artifacts, watermark, watermarked, text, Signature, sketch,

NSFW, Nudity, Nudity, Porn, (Worst Quality, Low Quality: 1.4), Distorted Irises, Distorted Pupils, (Deformed, Distorted, Disfigured: 1.3), Cropped, Off-Frame, Badly Drawn, Bad Interpretation , incorrect anatomy extra limbs, missing limbs, floating limbs, cloned faces (mutated hands and fingers: 1.4), broken limbs, extra legs, fused fingers too many fingers, long necks, mutations, mutation ugly disgusting, amputation blur, jpeg artifacts, watermark watermark text, signature, sketch

How to save templates

Save current prompt word as default style

Sampling times

Stable diffusion translates into Chinese: stable diffusion. The principle is that it gradually adds noise to the training image, and finally turns into a completely random noise image. This process is like a drop of ink dropped into a glass of water. It will slowly spread and eventually be evenly distributed in the water. That’s where the name diffusion comes from.

The higher the number of sampling iteration steps, the better the picture, but the longer the calculation time. Without special requirements, generally speaking, most of the time the sampling deployment only needs to be kept between 20 and 30 (default 20). There is no special change above 30.

Sampling method

We know that Stable Diffusion is a method of generating images based on the diffusion model. Its process is to use a picture full of noise as a reference to gradually diffuse closer to the target (prompt). This is the work of the diffusion sampler place. Simply put, these samplers are an algorithm that after each step compares the generated image with the image requested by the text prompt, and then adds some changes to the noise until it gradually reaches an image that matches the text description. image

There are many sampling methods that determine the quality of the image, but currently there are basically only a few recommended ones that are commonly used.

Euler a

The fastest sampling method, the most direct, simple and stable sampler

The requirements for the number of sampling steps are very low. At the same time, as the number of sampling steps increases, the details will not increase. The composition will suddenly change when the number of sampling steps increases to a certain number, so do not use it in high-step scenarios.

Suitable scenes: two-dimensional images, small scenes

DPM 2S a Karras

Can balance speed with quality and produce more accurate images and their details

Two dimensions

DPM SDE Karras

Not bad compared to 2s. In short, the main feature is that compared to Euler a, there will be more details at the same resolution. For example, the whole body can be crammed into the small picture, but the sampling speed is slower.

Realistic style, portraits, complex scenes

DDIM

It is rarely used. It is fast to produce pictures and can quickly generate high-quality images. However, if you want to try a super high number of steps, you can use it. As the number of steps increases, details can be superimposed.

Suitable scenes: realistic portraits, complex scenes

Facial restoration (generally more effective for realistic portraits, 2D is almost useless)

Tile map (used to generate pattern texture)

High resolution restoration

In layman's terms, it means enlarging the image by redrawing it, and adding some details while enlarging it.

Amplification algorithm: Just use the default value, Latent, for real people: R-ESRGAN 4x, for two-dimensional use: R-ESRGAN 4x Anime6B

Redraw amplitude: the impact of different redraw amplitude values (usually 0.4-0.7 is more suitable)

Width and height settings

Most models are trained at 512*512 resolution, and a few are trained at 768*768. So when the output size is relatively large, such as 1024*1024, AI will try to stuff two or three images into the image. Due to the amount of content in the picture, there will be various splicing of limbs, multiple people who are not controlled by the entries, multiple angles, etc. Adding entries can partially alleviate the problem, but the more important thing is to control the frame, first calculate the small and medium pictures, and then enlarge For the big picture.

The most important thing is that the image generation is too large, the calculation is slow, and it is easy to use up the video memory (it is recommended to generate images based on 512 or 768)

If you have a specific reference picture, put it in PS and modify the proportional size r so that the height and width values must be kept within 512-768 pixels, and the other size can be arbitrary. If you want a larger size, then use the HD restoration function

Square image 512*512, will tend to show faces and busts

The high image is 512*768, which will tend to show standing and sitting full-body images.

Generate batch/quantity

Number of images generated = generated batch * number of each batch

If the graphics card configuration is not good, it is not recommended to adjust the quantity parameters. It is recommended to modify it. It will be faster to generate batch pictures.

Stable Diffusion

2. Interface parameters and basic usage of Vincent diagram

Interface parameters

Prompt word guidance coefficient (CFG Scale)

The higher the CFG value, the more obedient Ai is, and the more relevant the generated image is to the prompt word.

CFG is relatively safe in the range of 5-10. It is generally recommended to be 7-10. Reduce or increase according to the actual situation.

Generally default: 7 is enough, fine-tune according to the screen content

The lower the CFG value, the more disobedient Ai is, and the more freely it can play, the weaker the correlation between the generated image and the prompt word.

Random seed number

An important parameter used to control randomness and diversity of generated results.

Click the sieve button to set the random seed to -1, which is random

Click the Recycle button to set the random seed to the random seed of the picture you are looking at in the picture bar on the right.

Mutation random seed: adjust the mutation intensity (a little value is enough, such as: 0.001)

generate

Use the last generated image data (including positive and negative prompt words and various parameters)

Clear positive and negative prompt words

Call up models and other content

Inserts the selected preset style after the current prompt word

Save prompt word template

If you want to modify the previously saved prompt word template, find the style file under the sd file, right-click to open it in notepad mode, and you can delete it. (Note: You must save a template before finding a file will appear)

Instructions for use

If the image generation fails and the memory is full, try adjusting the length, width and number of steps until it can run normally.

The keyword strength setting should not be too high (try it yourself and see)

Never write keywords and negative keywords backwards

I usually use 20 to 50 for screen steps (but with low video memory, most of them are still 30), and keyword intensity of 7 to 15.

3. Grammar and weight of prompt words

Positive prompt words: Compared with Midjourney, it needs to be written more accurately and carefully. The more descriptions, the closer it is to the desired content. The less descriptions give the AI more room to play freely.

Reverse prompt words: content you don’t want to appear

Writing principles

Almost all models only understand English words

All symbols must use English half-width, and phrases must be separated by half-width commas.

Line breaks are allowed, but it is best to put a delimiter (English half-width comma) at the end of each line.

Grammar principles

The earlier the word, the higher the weight will be, for example

tree,1girl, there may be a tree with a girl standing next to it

1girl,tree, there may be a portrait of a girl with a tree in the background

Therefore, the commonly used prompt word format in most cases is (three-paragraph writing)

masterpiece, bestquality, sketch, 1girl, stand, black jacket, wallbackgoround, full of poster, by token,

Advanced grammar Step-by-step drawing (gradient blending is the popular name, step-by-step drawing is closer to the original intention)

[ tagA : tagB : 0.3 ] Draw keyword A before 30% progress, draw keyword B after 30% progress [cat : dog :0.6 ] Draw cat before 60% progress, draw dog after 60% progress

[dog:dragon:6], in the sky, half-body, close-up------When the value is greater than or equal to 1, it means the number of steps * draw dog before the step, and draw dragon after * step progress. You can pass Control the number of steps and adjust the ratio between the two. Through different steps, the gradient from keyword 1 to keyword 2 can be achieved. This is the origin of the common name of gradient.

The distribution ends drawing [a girl: 5] in the seaside [ ] The square brackets are weight reduction. If you want someone not to stand out or be important, add square brackets to whoever is not important, and add the number of steps (the smaller the value, the less you want. The more obvious, the larger the value, the less unwanted things are displayed)

Writing method Picture quality words, subject description, background, composition

Quality words (masterpiece, best quality, etc.)

Topic description (1girl, long hair, Blue dress, smiling for the camera, etc.)

Scenes and environments (forest, tree, white flower, day, sunlight, cloudy sky, etc.)

Picture perspective and composition (close-up, full body, distant, etc.)

Try to write a paragraph of keywords according to the above structure.

Picture quality word/composition

masterpiece, best quality, 8k, crazy detail, intricate detail, ultra detail, ultra quality, high detail, bust

masterpiece, best quality, 8k, insane details, intricate details, hyperdetailed, hyper quality, high detail, half body,

Main body description (a little more detailed)

1 girl with long red hair, green eyes, wearing a scarf and a striped sweater, smiling slightly at the camera,

1 girl, long red hair, green eyes, shirt, jeans, smiling at the camera,

What does the background look like?

Intricate background on the beach, night, starry sky

Complex background, on the beach, at night, starry sky

If you don’t know how to write, you can go to site c and copy the key points of other people’s excellent works to learn.

When copying keywords, remember to check whether it is consistent with the local lora package name (without this lora, the generated results will be inconsistent)

Paste it into the forward keyword box and click the first button

Tips for reducing adult elements

Positive: family_friendly (adjust the proportion to adjust the weight. The higher the numerical proportion, the higher the chance of drawing children)

Reverse: nsfw, nude, naked, porn (unsuitable for the workplace, nudity, nudity, pornography), usually meaning adult-oriented, it is recommended to add nsfw every time you draw

Fixed starting hand

It is recommended to save it as a template for easy use next time

Simple front and back starting positions

Positive prompt words: masterpiece best quality, masterpiece, best quality

Reverse prompt words: nsfw,(worst quality, bad quality:1.3) nsfw,(worst quality, bad quality:1.3)

Slightly longer front and back start poses

Positive prompt words: masterpiece, best quality, 8k, insane details, intricate details, hyperdetailed, hyper quality, high detail, ultra detailed,

(Masterpiece, Best Quality, Super Quality, 8K Resolution, Crazy Detail, Intricate Detail. Super Detail, High Detail, Ultra Detail)

Reverse prompt words: NSFW, nude, naked, porn, (worst quality, low quality: 1.4), deformed iris, deformed pupils, (deformed, distorted, disfigured: 1.3), cropped, out of frame, poorly drawn, bad anatomy , wrong anatomy.extra limb, missing limb, floating limbs, cloned face, (mutated handsand fingers:1.4), disconnected limbs,extra legs, fused fingers, too manyfingers, long neck, mutation, mutated, ugly, disgusting, amputation,blurry , jpeg artifacts, watermark, watermarked, text, Signature, sketch,

NSFW, Nudity, Nudity, Porn, (Bad Quality, Low Quality: 1.4) Distorted Irises, Distorted Pupils, (Distorted, Disfigured: 1.3) Cropped, Out of Frame, Poorly Drawn, Poor Construction, Wrong structures, extra limbs, missing limbs, floating limbs, cloned faces, (mutated hands and fingers: 1.4) disconnected limbs, extra legs, fused fingers, extra fingers, long necks, mutations ,mutated,ugly,disgusting,amputation,blurry,jpeg artifacts,watermark,watermarked text,signature,sketch

--NSFW not suitable for work picture not suitable for work

In addition to these general words, you can also add them according to the needs of the picture. For example, if a dog appears in the generated picture, but you don't want the dog to appear in the picture, you can add "dog" to the reverse prompt word.

How to make realistic portraits more high-definition

Positive prompt words: photography, masterpiece, best quality, 8K, HDR, ROWphoto, highres, absurdres:1.2, Kodak portra 400, film grain, blurrybackground, bokeh:1.2, lens flare, (vibrant color:1.2).gril

photography photography

masterpiece, best quality. is of excellent quality (masterpiece, best quality)

8K, HDR, ROW photo,highres, absurdres:1.2 IS CLEAR, HIGH RESOLUTION (8K, HDR, ROW PHOTO, HIGH RESOLUTION: 1.2

Kodak portra 400, film grain, is the film characteristic (Kodak Portra 400, film grain)

blurry background, bokeh:1.2, lens flare, blurry background, blur, halo

vibrant color:1.2 is colorful

Enter the above keywords to get a more atmospheric feel and details.

Prompt word separator

1. Use English commas or " " as delimiters (prompt words: Rococo style, living room, large windows, red sofd seed: 3391285208)

2. There are spaces before and after the separator and it will not have any effect.

3. Like MJ, the earlier the word, the higher the weight.