MindMap Gallery Stable Diffusion
Detailed explanation of Stable Diffusion, which introduces the installation and deployment of model/lora/VAE/plug-ins/embeddings, the interface parameters and basic usage of Vincent diagrams.
Edited at 2024-04-08 21:25:40This is a mind map about bacteria, and its main contents include: overview, morphology, types, structure, reproduction, distribution, application, and expansion. The summary is comprehensive and meticulous, suitable as review materials.
This is a mind map about plant asexual reproduction, and its main contents include: concept, spore reproduction, vegetative reproduction, tissue culture, and buds. The summary is comprehensive and meticulous, suitable as review materials.
This is a mind map about the reproductive development of animals, and its main contents include: insects, frogs, birds, sexual reproduction, and asexual reproduction. The summary is comprehensive and meticulous, suitable as review materials.
This is a mind map about bacteria, and its main contents include: overview, morphology, types, structure, reproduction, distribution, application, and expansion. The summary is comprehensive and meticulous, suitable as review materials.
This is a mind map about plant asexual reproduction, and its main contents include: concept, spore reproduction, vegetative reproduction, tissue culture, and buds. The summary is comprehensive and meticulous, suitable as review materials.
This is a mind map about the reproductive development of animals, and its main contents include: insects, frogs, birds, sexual reproduction, and asexual reproduction. The summary is comprehensive and meticulous, suitable as review materials.
Stable Diffusion
1. Installation and deployment of model/lora/VAE/plug-in/embeddings
Three ways to install extensions
1. Go to the extension page, click Available to load the extension list, remember to uncheck [Include advertising, language pack, installed] to see the plug-in list
Here we take the installation of the 3D Openpose editor as an example. Since there are too many plug-ins, we can use the Ctrl F web search function and enter openpose to quickly search for the corresponding plug-ins, and then click Install later.
2. Find the URL of the plug-in. Each one is different. Install from the URL and copy the link to install.
This method requires knowing the github address of the plug-in
The above two installation methods are recommended. You need to turn on the magic to install successfully. However, it is unstable and may cause installation errors and unsuccessful attempts.
The advantage is that you can update the plug-in directly from Extension-Check for Updates
You can also update directly from the Autumn Leaves Launcher
3. If the above methods fail or the plug-in does not display, then manually install it to the plug-in path. Let’s take the installation of the Controlnet plug-in as an example. Open the URL of GitHub where the Contrglnet plug-in is located: https://qithub.com/lllyasviel/ControlNet-v1- 1-nightly
After downloading, unzip it and put it into the ovelai-webui\extensions extension folder to restart webUI and you will find that the plug-in is installed.
Disadvantage: After updating, you must manually put the updated folder into the plug-in directory and the Akiye package will automatically update.
After installing the plug-in, you must reload weib ui. If it is not displayed, try turning off the launcher and re-entering it.
Associations between large models, lora, VAE, plug-ins, and embeddings
Large model: plates, many types
stable diffusion\models\Stable-diffusion
Also called bottom model and main model. The model that has the greatest impact on performance.
Real person/product/two-dimensional
The volume is relatively large, usually several G
Lora: rich food
stable diffusion\models\Lora
Simply put, by mounting Lora, you can specify the characteristics of the character or style to be generated.
Hanfu/ink style/three views/blind box
Volume is about 100M
vae: seasoning to make food delicious
stable diffusion\models|VAE
VAE can be simply understood as a color profile or picture filter. Without VAE, the picture will be gray.
Nowadays, many large models have built-in VAE. Some of them do not have it and need to be used. 84000 is commonly used and the default is generally unchanged.
Plug-in: Chopsticks, forks, let us eat better
stable diffusion\extensions
Such as translation plug-in, and ControlNet
embeddings: ready-made cooking packages
stable diffusion\embeddings
In fact, it means prompt word packaging, which is often used to avoid collapse of human body structure, painting style, spatial structure, etc. If there is no embedding, if you want to avoid the collapse of the painting style, you may need to say dozens of keywords, but now with good embedding, you only need to enter a prompt word to generate a good picture.
Model URL
Need magic
Official model website C: https://civitai.com/
Hugging Face: ttps://huggingface.co/models?other=stable-diffusion
No magic required
LibuLibuai: http://www.liblibai.com/#/
Alchemy Pavilion: http://www.liandange.com/models
Autumn Leaf Launcher, no preview image, not very realistic
2. Interface parameters and basic usage of Vincent diagram
Interface parameters
Large model selection: Select the model (base model) to be used. This is the factor that has the greatest impact on the generated results, mainly reflected in the picture style.
Vae: simply understood as a filter, the default is 84000
Adjust the number of layers: The smaller the clip layer value, the closer the description will be to the descriptor. The larger the clip, the higher the degree of freedom. The default value is 2, no need to change it
Prompt word input
positive cue words
Image quality category: masterpiece, best quality, highres, highly detailed, Masterpiece, best quality, high definition, high detail
Subject: a girl, a boy, a dog, a house
Attributes: long blond hair, blue eyes, fat, thin, earrings, wearing a windbreaker, wearing a skirt, modern style, baroque, Chinese style
Background: hospital, school, apartment, street, transparent background, gradient background
Painting style: realistic style, illustration style, monochrome, comic, retro. Shots: full-length portrait, half-life portrait, selfie mirror, frontal face, looking at the audience, facing the camera
Others: winter, snow, rain, warm colors, green-orange colors
reverse prompt word
If you do not input it, the quality of the output will not be high and the painting style will easily collapse. You can set it as a fixed template.
NSFW, nude, naked, porn, (worst quality, low quality:1.4), deformediris, deformed pupils, (deformed, distorted, disfigured:1.3), croppedout of frame, poorly drawn, bad anatomy, wrong anatomy, extra limb, missing limb, floating limbs, cloned face, (mutated hands andfingers:1.4), disconnected limhs, extra legs, fused fingers, too manyfingers, long neck, mutation, mutated, ugly, disgusting, amputation,blurry, jpeg artifacts, watermark, watermarked, text, Signature, sketch,
NSFW, Nudity, Nudity, Porn, (Worst Quality, Low Quality: 1.4), Distorted Irises, Distorted Pupils, (Deformed, Distorted, Disfigured: 1.3), Cropped, Off-Frame, Badly Drawn, Bad Interpretation , incorrect anatomy extra limbs, missing limbs, floating limbs, cloned faces (mutated hands and fingers: 1.4), broken limbs, extra legs, fused fingers too many fingers, long necks, mutations, mutation ugly disgusting, amputation blur, jpeg artifacts, watermark watermark text, signature, sketch
How to save templates
Save current prompt word as default style
Sampling times
Stable diffusion translates into Chinese: stable diffusion. The principle is that it gradually adds noise to the training image, and finally turns into a completely random noise image. This process is like a drop of ink dropped into a glass of water. It will slowly spread and eventually be evenly distributed in the water. That’s where the name diffusion comes from.
The higher the number of sampling iteration steps, the better the picture, but the longer the calculation time. Without special requirements, generally speaking, most of the time the sampling deployment only needs to be kept between 20 and 30 (default 20). There is no special change above 30.
Sampling method
We know that Stable Diffusion is a method of generating images based on the diffusion model. Its process is to use a picture full of noise as a reference to gradually diffuse closer to the target (prompt). This is the work of the diffusion sampler place. Simply put, these samplers are an algorithm that after each step compares the generated image with the image requested by the text prompt, and then adds some changes to the noise until it gradually reaches an image that matches the text description. image
There are many sampling methods that determine the quality of the image, but currently there are basically only a few recommended ones that are commonly used.
Euler a
The fastest sampling method, the most direct, simple and stable sampler
The requirements for the number of sampling steps are very low. At the same time, as the number of sampling steps increases, the details will not increase. The composition will suddenly change when the number of sampling steps increases to a certain number, so do not use it in high-step scenarios.
Suitable scenes: two-dimensional images, small scenes
DPM 2S a Karras
Can balance speed with quality and produce more accurate images and their details
Two dimensions
DPM SDE Karras
Not bad compared to 2s. In short, the main feature is that compared to Euler a, there will be more details at the same resolution. For example, the whole body can be crammed into the small picture, but the sampling speed is slower.
Realistic style, portraits, complex scenes
DDIM
It is rarely used. It is fast to produce pictures and can quickly generate high-quality images. However, if you want to try a super high number of steps, you can use it. As the number of steps increases, details can be superimposed.
Suitable scenes: realistic portraits, complex scenes
Facial restoration (generally more effective for realistic portraits, 2D is almost useless)
Tile map (used to generate pattern texture)
High resolution restoration
In layman's terms, it means enlarging the image by redrawing it, and adding some details while enlarging it.
Amplification algorithm: Just use the default value, Latent, for real people: R-ESRGAN 4x, for two-dimensional use: R-ESRGAN 4x Anime6B
Redraw amplitude: the impact of different redraw amplitude values (usually 0.4-0.7 is more suitable)
Width and height settings
Most models are trained at 512*512 resolution, and a few are trained at 768*768. So when the output size is relatively large, such as 1024*1024, AI will try to stuff two or three images into the image. Due to the amount of content in the picture, there will be various splicing of limbs, multiple people who are not controlled by the entries, multiple angles, etc. Adding entries can partially alleviate the problem, but the more important thing is to control the frame, first calculate the small and medium pictures, and then enlarge For the big picture.
The most important thing is that the image generation is too large, the calculation is slow, and it is easy to use up the video memory (it is recommended to generate images based on 512 or 768)
If you have a specific reference picture, put it in PS and modify the proportional size r so that the height and width values must be kept within 512-768 pixels, and the other size can be arbitrary. If you want a larger size, then use the HD restoration function
Square image 512*512, will tend to show faces and busts
The high image is 512*768, which will tend to show standing and sitting full-body images.
Generate batch/quantity
Number of images generated = generated batch * number of each batch
If the graphics card configuration is not good, it is not recommended to adjust the quantity parameters. It is recommended to modify it. It will be faster to generate batch pictures.
Stable Diffusion
2. Interface parameters and basic usage of Vincent diagram
Interface parameters
Prompt word guidance coefficient (CFG Scale)
The higher the CFG value, the more obedient Ai is, and the more relevant the generated image is to the prompt word.
CFG is relatively safe in the range of 5-10. It is generally recommended to be 7-10. Reduce or increase according to the actual situation.
Generally default: 7 is enough, fine-tune according to the screen content
The lower the CFG value, the more disobedient Ai is, and the more freely it can play, the weaker the correlation between the generated image and the prompt word.
Random seed number
An important parameter used to control randomness and diversity of generated results.
Click the sieve button to set the random seed to -1, which is random
Click the Recycle button to set the random seed to the random seed of the picture you are looking at in the picture bar on the right.
Mutation random seed: adjust the mutation intensity (a little value is enough, such as: 0.001)
generate
Use the last generated image data (including positive and negative prompt words and various parameters)
Clear positive and negative prompt words
Call up models and other content
Inserts the selected preset style after the current prompt word
Save prompt word template
If you want to modify the previously saved prompt word template, find the style file under the sd file, right-click to open it in notepad mode, and you can delete it. (Note: You must save a template before finding a file will appear)
Instructions for use
If the image generation fails and the memory is full, try adjusting the length, width and number of steps until it can run normally.
The keyword strength setting should not be too high (try it yourself and see)
Never write keywords and negative keywords backwards
I usually use 20 to 50 for screen steps (but with low video memory, most of them are still 30), and keyword intensity of 7 to 15.
3. Grammar and weight of prompt words
Positive prompt words: Compared with Midjourney, it needs to be written more accurately and carefully. The more descriptions, the closer it is to the desired content. The less descriptions give the AI more room to play freely.
Reverse prompt words: content you don’t want to appear
Writing principles
Almost all models only understand English words
All symbols must use English half-width, and phrases must be separated by half-width commas.
Line breaks are allowed, but it is best to put a delimiter (English half-width comma) at the end of each line.
Grammar principles
The earlier the word, the higher the weight will be, for example
tree,1girl, there may be a tree with a girl standing next to it
1girl,tree, there may be a portrait of a girl with a tree in the background
Therefore, the commonly used prompt word format in most cases is (three-paragraph writing)
masterpiece, bestquality, sketch, 1girl, stand, black jacket, wallbackgoround, full of poster, by token,
Advanced grammar Step-by-step drawing (gradient blending is the popular name, step-by-step drawing is closer to the original intention)
[ tagA : tagB : 0.3 ] Draw keyword A before 30% progress, draw keyword B after 30% progress [cat : dog :0.6 ] Draw cat before 60% progress, draw dog after 60% progress
[dog:dragon:6], in the sky, half-body, close-up------When the value is greater than or equal to 1, it means the number of steps * draw dog before the step, and draw dragon after * step progress. You can pass Control the number of steps and adjust the ratio between the two. Through different steps, the gradient from keyword 1 to keyword 2 can be achieved. This is the origin of the common name of gradient.
The distribution ends drawing [a girl: 5] in the seaside [ ] The square brackets are weight reduction. If you want someone not to stand out or be important, add square brackets to whoever is not important, and add the number of steps (the smaller the value, the less you want. The more obvious, the larger the value, the less unwanted things are displayed)
Writing method Picture quality words, subject description, background, composition
Quality words (masterpiece, best quality, etc.)
Topic description (1girl, long hair, Blue dress, smiling for the camera, etc.)
Scenes and environments (forest, tree, white flower, day, sunlight, cloudy sky, etc.)
Picture perspective and composition (close-up, full body, distant, etc.)
Try to write a paragraph of keywords according to the above structure.
Picture quality word/composition
masterpiece, best quality, 8k, crazy detail, intricate detail, ultra detail, ultra quality, high detail, bust
masterpiece, best quality, 8k, insane details, intricate details, hyperdetailed, hyper quality, high detail, half body,
Main body description (a little more detailed)
1 girl with long red hair, green eyes, wearing a scarf and a striped sweater, smiling slightly at the camera,
1 girl, long red hair, green eyes, shirt, jeans, smiling at the camera,
What does the background look like?
Intricate background on the beach, night, starry sky
Complex background, on the beach, at night, starry sky
If you don’t know how to write, you can go to site c and copy the key points of other people’s excellent works to learn.
When copying keywords, remember to check whether it is consistent with the local lora package name (without this lora, the generated results will be inconsistent)
Paste it into the forward keyword box and click the first button
Tips for reducing adult elements
Positive: family_friendly (adjust the proportion to adjust the weight. The higher the numerical proportion, the higher the chance of drawing children)
Reverse: nsfw, nude, naked, porn (unsuitable for the workplace, nudity, nudity, pornography), usually meaning adult-oriented, it is recommended to add nsfw every time you draw
Fixed starting hand
It is recommended to save it as a template for easy use next time
Simple front and back starting positions
Positive prompt words: masterpiece best quality, masterpiece, best quality
Reverse prompt words: nsfw,(worst quality, bad quality:1.3) nsfw,(worst quality, bad quality:1.3)
Slightly longer front and back start poses
Positive prompt words: masterpiece, best quality, 8k, insane details, intricate details, hyperdetailed, hyper quality, high detail, ultra detailed,
(Masterpiece, Best Quality, Super Quality, 8K Resolution, Crazy Detail, Intricate Detail. Super Detail, High Detail, Ultra Detail)
Reverse prompt words: NSFW, nude, naked, porn, (worst quality, low quality: 1.4), deformed iris, deformed pupils, (deformed, distorted, disfigured: 1.3), cropped, out of frame, poorly drawn, bad anatomy , wrong anatomy.extra limb, missing limb, floating limbs, cloned face, (mutated handsand fingers:1.4), disconnected limbs,extra legs, fused fingers, too manyfingers, long neck, mutation, mutated, ugly, disgusting, amputation,blurry , jpeg artifacts, watermark, watermarked, text, Signature, sketch,
NSFW, Nudity, Nudity, Porn, (Bad Quality, Low Quality: 1.4) Distorted Irises, Distorted Pupils, (Distorted, Disfigured: 1.3) Cropped, Out of Frame, Poorly Drawn, Poor Construction, Wrong structures, extra limbs, missing limbs, floating limbs, cloned faces, (mutated hands and fingers: 1.4) disconnected limbs, extra legs, fused fingers, extra fingers, long necks, mutations ,mutated,ugly,disgusting,amputation,blurry,jpeg artifacts,watermark,watermarked text,signature,sketch
--NSFW not suitable for work picture not suitable for work
In addition to these general words, you can also add them according to the needs of the picture. For example, if a dog appears in the generated picture, but you don't want the dog to appear in the picture, you can add "dog" to the reverse prompt word.
How to make realistic portraits more high-definition
Positive prompt words: photography, masterpiece, best quality, 8K, HDR, ROWphoto, highres, absurdres:1.2, Kodak portra 400, film grain, blurrybackground, bokeh:1.2, lens flare, (vibrant color:1.2).gril
photography photography
masterpiece, best quality. is of excellent quality (masterpiece, best quality)
8K, HDR, ROW photo,highres, absurdres:1.2 IS CLEAR, HIGH RESOLUTION (8K, HDR, ROW PHOTO, HIGH RESOLUTION: 1.2
Kodak portra 400, film grain, is the film characteristic (Kodak Portra 400, film grain)
blurry background, bokeh:1.2, lens flare, blurry background, blur, halo
vibrant color:1.2 is colorful
Enter the above keywords to get a more atmospheric feel and details.
Prompt word separator
1. Use English commas or " " as delimiters (prompt words: Rococo style, living room, large windows, red sofd seed: 3391285208)
2. There are spaces before and after the separator and it will not have any effect.
3. Like MJ, the earlier the word, the higher the weight.