MindMap Gallery Overview of Image Segmentation Algorithms
A review of image segmentation algorithms, including traditional image segmentation methods, performance analysis comparison and summary, deep learning-based segmentation methods-segmentation network models, etc.
Edited at 2022-04-10 10:44:06One Hundred Years of Solitude is the masterpiece of Gabriel Garcia Marquez. Reading this book begins with making sense of the characters' relationships, which are centered on the Buendía family and tells the story of the family's prosperity and decline, internal relationships and political struggles, self-mixing and rebirth over the course of a hundred years.
One Hundred Years of Solitude is the masterpiece of Gabriel Garcia Marquez. Reading this book begins with making sense of the characters' relationships, which are centered on the Buendía family and tells the story of the family's prosperity and decline, internal relationships and political struggles, self-mixing and rebirth over the course of a hundred years.
Project management is the process of applying specialized knowledge, skills, tools, and methods to project activities so that the project can achieve or exceed the set needs and expectations within the constraints of limited resources. This diagram provides a comprehensive overview of the 8 components of the project management process and can be used as a generic template for direct application.
One Hundred Years of Solitude is the masterpiece of Gabriel Garcia Marquez. Reading this book begins with making sense of the characters' relationships, which are centered on the Buendía family and tells the story of the family's prosperity and decline, internal relationships and political struggles, self-mixing and rebirth over the course of a hundred years.
One Hundred Years of Solitude is the masterpiece of Gabriel Garcia Marquez. Reading this book begins with making sense of the characters' relationships, which are centered on the Buendía family and tells the story of the family's prosperity and decline, internal relationships and political struggles, self-mixing and rebirth over the course of a hundred years.
Project management is the process of applying specialized knowledge, skills, tools, and methods to project activities so that the project can achieve or exceed the set needs and expectations within the constraints of limited resources. This diagram provides a comprehensive overview of the 8 components of the project management process and can be used as a generic template for direct application.
A review of image segmentation methods
introduction
Image segmentation: Divide the image into disjoint and meaningful sub-regions
Pixels in the same area: correlation
Pixels in different areas: differences
Traditional image segmentation methods
use:
Image processing preprocessing steps
Obtain key feature information of the image
Improve image analysis efficiency
Classification
Threshold-based: grayscale image segmentation method
Essence: Set different grayscale thresholds and classify the image grayscale histogram (the same grayscale range belongs to the same category and has a certain similarity)
process:
f(i,j): represents the gray value of (i,j)
T: Grayscale threshold
By comparing the image pixel gray value with the threshold, it is divided into two parts: target and background. The output image g(i,j) changes, with a value of 0 or 1.
1 (target): f(i,j)>=T
0 (background): f(i,j)<T
The larger the threshold T, the more pixels are divided into targets.
Classification:
Point-based global threshold segmentation method
Region-based global threshold segmentation method
Local threshold segmentation method
... ...
Analyzed:
Applicable situations:
The target gray level is evenly distributed and changes little
The difference in grayscale between the target and the background is obvious
advantage:
Simple and easy to implement
efficient
insufficient:
Only the gray value of the pixel itself is considered, and feature information such as image semantics and space are not considered.
susceptible to noise
Not ideal for complex images
Practical applications:
Preprocessing method
Use in conjunction with other segmentation methods
edge based
Theoretical basis: The gray value of the boundary pixel is greatly different from the gray value of the adjacent pixel.
Process: Connect points (edge points) with large differences in gray value from adjacent pixels to form a boundary outline
Classification:
Serial edge detection method: first detect the edge starting point, start from the starting point and search and connect adjacent edge points through the similarity criterion
Parallel edge detection method: using spatial differential operators to convolve templates with images
Roberts
Sobel
Prewitt
LoG
Canny
... ...
Summary: In practical applications, the parallel edge detection method is simple and fast, has relatively good performance, and is the most commonly used method.
Based on region
Algorithm principle: Segment according to image spatial information, classify pixels and form regions through the similarity features of pixels
Classification
region growing method
Principle: Collect pixels with similar properties to form an independent area
process:
1. Select a group of seed points as the starting point for growth (either a single pixel or a small area)
2. According to the growth criterion, merge the seed point and nearby pixels with similar characteristics into the area where the seed point is located.
3. Use the new pixels as seed points and iterate repeatedly until all areas are detected and stop growing.
The essential
seed point
Selection method
artificial selection
Algorithm automatically selects
Growth criteria (image feature information)
color
texture
space
... ...
Analyzed
Advantages: simple calculation
insufficient:
1. Noise sensitive
2. Easily lead to regional vacancies
split-merge method
The essence of the algorithm: continuous splitting and merging to obtain each sub-region of the image
process:
1. Divide the image into regular areas
2. According to the similarity criterion, split areas with different characteristics and merge adjacent areas with the same characteristics until no splits and mergers occur.
Key points/difficulties
initial partition
Split-Merge Similarity Criterion
Analyzed
Advantages: better segmentation effect on complex images
insufficient:
1. The calculation is complex
2. Boundaries may be breached during division
Based on clustering
Algorithm principle: gather pixels with similar characteristics into the same area, iterate the clustering results repeatedly until convergence, and finally gather all pixels into several different categories to complete image area division == image segmentation
Example analysis of typical algorithms
Simple Linear Iterative Clustering SLIC (Superpixel Segmentation) ==>Image segmentation is transformed into a pixel clustering problem
Algorithm idea: Based on clustering, the pixels in the image are divided into super-pixel blocks
Algorithm steps:
1. Convert RGB color image mapping into Lab image (Lab space retains a wider color area and provides richer color characteristics)
L: brightness
a: range from magenta to green
b: range from yellow to blue
2. Combine the color features (L, a, b) and coordinates (x, y) of each pixel into a vector (L, a, b, x, y) for distance measurement
Color distance between pixels i and j
The spatial distance between pixels i and j
The final distance is measured
Maximum color distance: take integer [1,40]
Maximum spatial distance within a class
Superpixel block size - distance between adjacent seed points
The total number of pixels in the image
Sum of pre-segmented superpixel blocks
advantage
Stable performance
Good robustness
Applicable: image segmentation, pose estimation, target tracking and recognition, etc.
Based on graph theory
Algorithm idea: convert the segmentation problem into graph partitioning, and complete the segmentation by optimizing the solution of the objective function
Examples of classic algorithms
Graph Cut
Algorithm idea: The minimum cut problem is applied to the image segmentation problem to segment the image into foreground and background.
Algorithm introduction:
1. Mapping the image into an S-T diagram
Undirected graph G=(V,E) with weights
V: Vertex set == vertex corresponding to the pixel point of the original image
E: Edge set == The weight of the edge is the similarity between pixels
Each node is connected to the terminal vertices S and T to form a dotted edge.
The weight of the dotted edge of the vertex connected to S is the probability that the point is the foreground target.
The weight of the dotted line edge of the vertex connected to T is the probability that the point is the background
One kind of edge: the edge formed by connecting ordinary nodes representing pixel points to each other; the other kind of edge: the edge between the terminal vertex and the node connecting it
2. Solve the problem of minimizing the energy loss function
cut: All edges in the edge set are disconnected - separation of S-T graph
min cut: The sum of all values of its corresponding edges in a cut is the smallest
3. Find min cut and iterate continuously
Evaluation, find the minimum value of the energy loss function
Advantages: It uses the grayscale information of the image and also uses the regional boundary information. Through the rightmost solution, the best segmentation effect is obtained.
insufficient
Large amount of calculation
Prefer to segment images with the same intra-class similarity
Grab Cut
One Cut
... ...
based on a specific theory
mathematical morphology theory
Overcome the influence of noise and obtain clear edge images
genetic algorithm
Simulate the natural survival of the fittest to obtain the optimal solution and achieve optimal segmentation
Wavelet transform
active contour model
fuzzy theory
rough set theory
... ...
Segmentation method based on deep learning-segmentation network model
Full convolution network FCN (full convolution network) - image semantic segmentation
Algorithm idea:
After 8 layers of convolution processing, the feature map is upsampled to implement a deconvolution operation, classified through the SoftMax layer, and finally the segmentation result is output - multiple convolution operations. The size of the feature map is much smaller than the original input image, and many underlying features are lost. Image information, directly classified, affects segmentation accuracy
The upsampling process adopts Skip strategy
algorithm process
Combine deep data with shallow information, and then restore the output of the original image to obtain more accurate segmentation results.
According to different pooling layers, it is divided into
FCN-32s model segmentation results
Feature maps at different levels
Convolution: 7 times
FCN-16 model segmentation results
Pooling: 4 times - Pool4 layer
Bilinear interpolation method--Conv7
Upsampling classification after fusion
FCN-8s model segmentation results
Pooling: 3 times - Pool3 layer
Bilinear interpolation method - Conv7 layer, Pool4 layer
Upsampling classification after fusion
FCN-8s: Integrate more layers of feature information, segment to obtain clearer contour information, and the segmentation effect is relatively good.
Algorithm evaluation
It can classify images at the pixel level and effectively solve the problem of image semantic segmentation.
Images of any size can be input
The first end-to-end segmentation network model
insufficient
The network is relatively large - not sensitive enough to the detailed information of the image
The correlation between pixels is low - the target boundary is blurred
Pyramid scene parsing network PSPNet (pyramid scene parsing network) - image semantic segmentation
Algorithmic thinking
Integrate contextual information, make full use of prior knowledge of global features, analyze different scenes, and achieve semantic segmentation of scene targets.
algorithm process
1. Given an input image
2.CNN: Obtain the convolutional layer feature map
3. Pyramid pooling module: collect features of different sub-intervals
4. Upsampling
5. Concatenate and fuse the features of each sub-region
6. Form feature representations containing local and global context information
7. Convolution and SoftMax classification of feature representations
8. Prediction results for each pixel
Algorithm evaluation
For scene parsing and semantic segmentation tasks - able to extract appropriate global features
Use the pyramid pooling module to fuse local and global information together
Propose an optimization strategy for moderate supervision loss
Disadvantages: The handling of occlusion between targets is not ideal.
DeepLab series models - deep neural network model, image semantic segmentation
The core of the algorithm: using atrous convolution (the method of jacking in the convolution kernel)
Explicitly control the resolution of the response when computing characteristic responses
Expand the receptive field of the convolution kernel
Integrate more feature information without increasing the amount of parameters and calculations
development path
The earliest DeepLab model
Algorithm Description
input image
Processed by deep convolutional neural network (DCNN) with atrous convolutional layers - rough scoring map
bilinear interpolation upsampling
Introducing fully connected conditional random fields (CRF)s
output image
Algorithm evaluation
Fully consider global information to more accurately classify target edge pixels
Eliminate noise interference and improve segmentation accuracy
DeepLab-v2 model
Extend atrous as a porous spatial pyramid pooling (ASPP) module
Cascade; multi-scale atrous convolution layer and feature map fusion
Keep fully connected CRF as post-processing
DeepLab-v3 model
Convolution pooling: image size reduced by 4 times
3 Block module convolution: image reduced by 8 times
Linear rectification function (ReLU): image reduced by 16 times
Pooling: image reduced by 16 times
Block4 processing
ASPP module: Fusion of different porous convolutions (number of jacks rate=6, 12, 18)
Integration of 1*1 convolution layer and global pooling layer: feature map reduced by 16 times
Classification prediction: segmentation map
DeepLad-v3 model-encoding and decoding structure
Algorithm Description
Coding part: DeepLab-v3 model
Decoding part input
Shallow feature map in DCNN
ASPP fused feature map after convolution
decoding module
Convolution: input shallow feature map
Fusion: upsampled ASPP feature map
Output: Convolved and upsampled original size segmentation map
Algorithm evaluation
Clearly distinguish foreground targets and background
Target edges are clearly defined
This model enables fine-grained segmentation
Mask R-CNN--image instance segmentation
Origin: Based on Faster R-CNN
Algorithm Description
Algorithm framework
The first stage:
Region proposal networks (RPN) - Propose candidate target boundary framework
The content (RoI) in the bounding box is processed by RoIAlign - the RoI is divided into m*m sub-regions
second stage:
Parallel to the prediction class and bounding box regression tasks - add a branch to output a binary mask for each RoI That is, each RoI is segmented with FCN and the segmentation mask is predicted in a pixel-to-pixel manner.
Training phase: using multi-task loss constraint L
L=target classification loss, detection task loss, instance segmentation loss
Algorithm evaluation
On the basis of semantic segmentation, instance segmentation is realized - accurate detection and positioning of foreground targets, distinguishing different individuals of similar targets.
Semantic segmentation: identifying the content and location present in the image
Instance segmentation: distinguishing different individuals under the same category based on semantic segmentation
Higher segmentation accuracy
Models are more flexible
Can be used for a variety of computer vision tasks
Target classification
Target Detection
Instance splitting
Human posture recognition
... ...
Performance analysis comparison and summary
Performance analysis
Deep learning segmentation data set:
PASCAL VOC
MicrosoftCOCO
Cityscapes
Qualitative analysis
Quantitative analysis
Semantic segmentation: The average intersection and union ratio mIoU represents the ratio of the intersection and union of two sets. In semantic segmentation, it refers to the set of true values and predicted values.
Instance segmentation: Pixel accuracy PA, which represents the proportion of correctly classified pixels to the total pixels
Summarize
status quo:
Image segmentation is increasingly used in computer vision tasks
Accuracy and speed have been significantly improved
problem:
Lack of segmentation data sets and heavy annotation work
Small-sized target segmentation is not accurate enough
The segmentation algorithm is computationally complex
Unable to achieve real-time interactive segmentation, hindering the implementation, application and promotion of segmentation technology