MindMap Gallery pattern recognition
Also called machine learning or data mining. It mainly includes introduction, data preprocessing, cluster analysis, Bayesian classification, nearest neighbor method, etc.
Edited at 2024-02-04 00:51:57One Hundred Years of Solitude is the masterpiece of Gabriel Garcia Marquez. Reading this book begins with making sense of the characters' relationships, which are centered on the Buendía family and tells the story of the family's prosperity and decline, internal relationships and political struggles, self-mixing and rebirth over the course of a hundred years.
One Hundred Years of Solitude is the masterpiece of Gabriel Garcia Marquez. Reading this book begins with making sense of the characters' relationships, which are centered on the Buendía family and tells the story of the family's prosperity and decline, internal relationships and political struggles, self-mixing and rebirth over the course of a hundred years.
Project management is the process of applying specialized knowledge, skills, tools, and methods to project activities so that the project can achieve or exceed the set needs and expectations within the constraints of limited resources. This diagram provides a comprehensive overview of the 8 components of the project management process and can be used as a generic template for direct application.
One Hundred Years of Solitude is the masterpiece of Gabriel Garcia Marquez. Reading this book begins with making sense of the characters' relationships, which are centered on the Buendía family and tells the story of the family's prosperity and decline, internal relationships and political struggles, self-mixing and rebirth over the course of a hundred years.
One Hundred Years of Solitude is the masterpiece of Gabriel Garcia Marquez. Reading this book begins with making sense of the characters' relationships, which are centered on the Buendía family and tells the story of the family's prosperity and decline, internal relationships and political struggles, self-mixing and rebirth over the course of a hundred years.
Project management is the process of applying specialized knowledge, skills, tools, and methods to project activities so that the project can achieve or exceed the set needs and expectations within the constraints of limited resources. This diagram provides a comprehensive overview of the 8 components of the project management process and can be used as a generic template for direct application.
pattern recognition
introduction
Basic concepts of pattern recognition
pattern recognition
Using computers to realize people's pattern recognition ability is a technology that uses computers to realize people's analysis, description, judgment, and identification of various things or phenomena, and assigns the things to be recognized to various pattern categories.
Pattern recognition can be viewed as the mapping from patterns to categories
model
Information about a substance or phenomenon
Broadly speaking, observable objects that exist in time and space can be called patterns if they can be distinguished as being the same or similar.
A pattern is a description of an object formed through information collection. This description should be standardized, understandable, and identifiable.
illustrate
A pattern is not the thing itself, but the information obtained from the thing. For example, people’s photos and personal information
Can distinguish whether patterns are similar (relevant to the question)
Patterns are generally represented by vectors, and subscripts can reflect time characteristics, spatial characteristics, or other identifiers.
pattern vector
Information with time and space distribution obtained by observing specific individual things (referred to as samples or sample vectors)
Pattern class
The category to which a pattern belongs or the population of patterns in the same category (category for short)
pattern recognition system
Consists of two processes: design and implementation
The category to which a pattern belongs or the population of patterns in the same category (category for short)
Design (training, learning)
Refers to using a certain number of samples (called training set or learning set) to design a classifier
Realization (decision-making, classification, judgment)
Refers to using the designed classifier to make classification decisions for the samples to be identified.
System composition
Data collection (data acquisition)
Way
Through various sensors, information such as light or sound is converted into electrical information, or text information is input into the computer
Classification
One-dimensional waveforms: sound waves, electrocardiogram, electroencephalogram, etc.
Two-dimensional images: text, images, etc.
3D images: faces, etc.
Physical quantities: person’s height, weight, product weight, quality level, etc.
Logical quantity (0/1): presence or absence, male and female, etc.
preprocessing
Purpose
Remove noise and enhance useful information
Commonly used techniques
One-dimensional signal filtering and denoising, image smoothing, enhancement, restoration, filtering, etc.
Feature extraction and selection
Purpose
From the original data, obtain the features that best reflect the nature of the classification
Feature formation
Several features reflecting classification problems are obtained from the original data through various means (sometimes data standardization is required)
Feature selection
Select several features that are most beneficial to classification from the features
Feature extraction
Reduce the number of features through certain mathematical transformations
Classification decision or model matching
Use decision rules in the feature space to assign the recognized object to a certain category
illustrate
This system structure is suitable for statistical pattern recognition, fuzzy pattern recognition, and supervised methods in artificial neural networks.
For structural pattern recognition methods, only primitive extraction is used to replace feature extraction and selection.
For cluster analysis, classifier design and decision-making are integrated into one step.
Image features
color
texture
shape
Spatial Relations
four spaces
Three major tasks
Pattern collection
Feature extraction and feature selection
Type discrimination
Related questions
Performance evaluation
Test error rate or error rate
computational complexity
divide
Classification basis
Question or sample nature
Supervised pattern recognition
First have a batch of samples with category labels, design a classifier based on the sample set, and then determine the new sample category
Unsupervised pattern recognition
There is only one batch of samples, and the sample set is directly divided into several categories based on the similarities between the samples.
main method
statistical pattern recognition
Classification
unsupervised classification
Cluster analysis
Supervised classification
Collection classification
Probabilistic classification
Describe method
Feature vector
Mode determination
Expressed by conditional probability distribution P (X/i), there are m distributions in m categories, and then determine which distribution the unknown pattern belongs to.
Theoretical basis
probability theory
mathematical statistics
advantage
More mature
Able to consider the impact of interfering noise
Strong ability to recognize pattern primitives
shortcoming
It is difficult to extract features from patterns with complex structures
It cannot reflect the structural characteristics of the pattern, and it is difficult to describe the nature of the pattern.
Difficulty considering identification issues from a holistic perspective
Structural pattern recognition
fuzzy pattern recognition
neural network method
Theoretical basis
Neurophysiology
psychology
Pattern description method
A set of input nodes represented by different levels of activity
Mode determination
nonlinear dynamic system
main method
BP model, HOPField model
advantage
Effectively solve complex nonlinear problems
Allow samples to have larger defects and distortions
shortcoming
Lack of effective learning theory
long time
Application areas
Images, faces, text, numbers, fingerprints, voices...
fundamental issue
Pattern (sample) representation method
n-dimensional column vector
x= (x1, x2, …, xn)T
Compactness of pattern classes
critical point (sample)
In a multi-category sample set, when the characteristic values of some samples change slightly, they become another category of samples. Such samples are called critical samples (points)
firm set
definition
The distribution of samples of the same pattern class is relatively concentrated, with no or very few critical samples. Such pattern classes are called compact sets.
nature
Very few critical points
A line connecting any two points in a set. The points on the line belong to the same set.
Each point in the set has a large enough neighborhood, and the neighborhood only contains points from the same set.
Require
satisfies tightness
similarity
Express similarity using various distances
Common distance
Minkowski distance
Absolute value distance or urban distance or Manhattan distance (q=1)
Euclidean distance (q=2)
Checkerboard distance or Chebyshev distance (q=∞)
Mahalanobis distance
where the covariance matrix and mean are
Standardization of data
Purpose
Eliminate the impact of the numerical range between each component on the algorithm
method
Standardize to [0,1] or [-1, 1], variance standardization
formula
Feature normalization
Variance normalization
Data preprocessing
Why do data preprocessing?
not good
incomplete
Lack of appropriate values during data collection
Different considerations during data collection and data analysis
Human/hardware/software issues
noisy
Problems with data collection tools
Human/computer error during data entry
Errors in data transmission
Inconsistent data types
different data sources
functional dependency violated
good
Correctness: such as whether it is correct, accurate or not, etc.
Completeness: if any data is missing or cannot be obtained
Consistency: if some data has been modified but others have not
Reliability: Describes the degree of confidence that the data is correct
Task
Data cleaning
Fill in missing values, smooth noisy data, identify and remove outliers, and resolve inconsistencies
data integration
Integrate multiple databases, data cubes or files
Data transformation and discretization
Standardize
Concept hierarchical generation
data reduction
Dimension reduction
Quantity reduction
data compression
Feature extraction and feature selection
Data cleaning
❑ Fill in missing values
reason
❑ Equipment abnormality
❑ Deleted due to inconsistency with other existing data
❑ Data that was not entered due to misunderstanding
❑ Some data was not entered because it was not taken seriously during input.
❑ No logging of data changes
deal with
◼ Ignore tuples: This is usually done when the class label is missing (assuming the mining task is designed to classify or describe), when the percentage of missing values for each attribute changes (the task is designed to classify or describe), when the percentage of missing values for each attribute varies greatly , its effect is very poor.
"Class Label" (Class Label or Target Label) usually refers to "the label used to represent the class or group to which the sample belongs" in the data set.
◼ Manually filling in missing values: heavy workload and low feasibility
◼ Automatically fill in missing values
❑ Use a global variable: such as unknown or -∞
❑ Use attribute averages
❑ Use the mean or median of all samples belonging to the same class as the given tuple
❑ Fill in the missing values with the most likely values: using inference-based methods like Bayesian formula or decision trees
❑ Smooth noise data
reason
❑ Issues with data collection tools
❑ Data entry errors
❑ Data transmission error
❑ Technical limitations
❑ Inconsistency in naming rules
deal with
binning
First sort the data and divide them into equal-depth bins. Then you can smooth by the mean of the bin, smooth by the median of the bin, smooth by the boundary of the bin, etc.
operate
Equal depth binning
Boundary value smoothing: turn all values into maximum or minimum values
Equal width binning
[110,155), left closed and right open
clustering
Detect and remove outliers through clustering
return
Smooth data by fitting it to a regression function
❑ Identify or delete outliers
❑ Resolve inconsistencies in data
data integration
◼ Data integration:
❑ Consolidate data from multiple data sources into a consistent store
◼ Pattern integration:
❑ Integrate metadata from different data sources
◼ e.g. A.cust_id = B.customer_no
◼ Entity recognition issues:
❑ Match real-world entities from different data sources
◼ e.g. Bill Clinton = William Clinton
◼ Detect and resolve data value conflicts
❑ For the same entity in the real world, attribute values from different data sources may be different
❑ Possible reasons: different data representation, different measurements, etc.
data reduction
Purpose
◆Complex data analysis of large-scale database content often takes a lot of time, making original data analysis unrealistic and unfeasible;
◆Data reduction: Data reduction or reduction is to reduce the size of the mined data without affecting the final mining results.
◆Data reduction techniques can be used to obtain a reduced representation of the data set, which is much smaller but still close to maintaining the integrity of the original data.
◆Mining the reduced data set can increase the efficiency of mining and produce the same (or almost the same) results.
standard
◆The time spent on data reduction should not exceed or "offset" the time saved in mining the reduced data set.
◆The data obtained by reduction is much smaller than the original data, but can produce the same or almost the same analysis results.
method
◆Data cube aggregation;
Aggregate n-dimensional data cubes into n-1-dimensional data cubes.
◆Dimension reduction (attribute reduction);
Find the minimum set of attributes to ensure that the probability distribution of the new data set is as close as possible to the probability distribution of the original data set.
PCA
◆Data compression;
lossless compression
Lossy compression
◆Numerical reduction;
Reduce data volume by choosing alternative, smaller data representations.
type
Histogram
clustering
sampling
◆Discretization and hierarchical generation of concepts.
Standardize
min-max normalization
It must be correct
z-score normalization (zero-mean normalization)
May be negative
discretization
Purpose
Data discretization is the process of dividing the values of continuous data into several intervals to simplify the complexity of the original data set.
type
Values in an unordered set; e.g. color, occupation
Values in an ordered set; e.g. military rank, professional title
Continuous values; e.g. real numbers
concept layering
Cluster analysis
concept
Thought
Classify each classified model based on a certain similarity measure.
Group similar ones into one category
algorithm
Simple clustering method based on similarity threshold and minimum distance principle
A method of continuously merging two categories according to the minimum distance principle
Dynamic clustering method based on criterion function
application
Cluster analysis can be used as a preprocessing step for other algorithms
Can be used as an independent tool to obtain the distribution of data
Cluster analysis can complete isolated point mining
Partition-based clustering methods
The partitioning method is to divide data objects into non-overlapping subsets (clusters) so that each data object is in exactly one subset.
Classification
distance type
Euclidean distance
manhattan distance
Minkowski distance
Min's distance is not a distance, but a definition of a set of distances.
Algorithm type
k-means (K-means) algorithm
Input: the number of clusters k and the database D containing n objects
Output: k clusters that minimize the squared error criterion.
Algorithm steps
1. Determine an initial cluster center for each cluster, so that there are K initial cluster centers. 2. The samples in the sample set are assigned to the nearest neighbor clusters according to the minimum distance principle. 3. Use the sample mean in each cluster as the new cluster center. 4. Repeat steps 2 and 3 until the cluster center no longer changes. 5. At the end, K clusters are obtained.
Features
advantage
Simple and fast
Scalable and efficient
The effect is better when the result set is dense
shortcoming
Can only be used if the cluster mean is defined
k must be given in advance
It is very sensitive to the initial value and directly affects the number of iterations.
Not suitable for finding clusters with non-convex shapes or clusters with widely varying sizes.
Is sensitive to "noise" and outlier data
Improve
k-mode algorithm: realizes fast clustering of discrete data, retains the efficiency of k-means algorithm and expands the application scope of k-means to discrete data.
k-prototype algorithm: It can cluster data that is a mixture of discrete and numerical attributes. In k-prototype, a dissimilarity metric is defined that calculates both numerical and discrete attributes.
k-Mediods algorithm (K-Mediods): The k-means algorithm is sensitive to isolated points. In order to solve this problem, instead of using the average value in the cluster as the reference point, you can choose the most central object in the cluster, that is, the center point as the reference point. This division method is still based on the principle of minimizing the sum of dissimilarities between all objects and their reference points.
k-medoids (K-center point) algorithm
Input: the number of clusters k and a database containing n objects.
Output: k clusters
Algorithm steps
1. Determine an initial clustering center for each cluster, so that there are k initial clustering centers. 2. Calculate the distances from all other points to the k center points, and regard the shortest cluster from each point to the k center points as the cluster to which it belongs. 3. Select points in order in each cluster, calculate the sum of distances from this point to all points in the current cluster, and the point with the smallest final distance sum is regarded as the new center point. 4. Repeat steps 2 and 3 until the center points of each cluster no longer change. 5. End, k clusters are obtained.
Features
advantage
The K-medoids algorithm calculates the point with the smallest sum of distances from a certain point to all other points. The influence of some isolated data on the clustering process can be reduced by calculating the smallest sum of distances. This makes the final effect closer to the real division.
shortcoming
Compared with the K-means algorithm, it will increase the calculation amount by about O(n), so in general, the K-medoids algorithm is more suitable for small-scale data operations.
Hierarchical based clustering algorithm
definition
Create a clustered tree of data objects. Depending on whether the hierarchical decomposition is formed from bottom-up or top-down, it can be further divided into agglomerative hierarchical clustering and divisive hierarchical clustering.
core
How to measure the distance between two clusters, where each cluster is generally a set of objects.
Classification
Distance type (inter-cluster distance measurement method)
Algorithm type
AGNES (agglomerative hierarchical clustering)
definition
AGNES (agglomerative hierarchical clustering) is a bottom-up strategy that first treats each object as a cluster and then merges these atomic clusters into larger and larger clusters until a certain terminal condition is met.
Similarity
The similarity between two clusters is determined by the similarity of the closest pairs of data points in the two different clusters.
step
1. Treat each object as an initial cluster; 2. REPEAT; 3. Find the two nearest clusters based on the nearest data points in the two clusters; 4. Merge two clusters to generate a new cluster set; 5. UNTIL reaches the number of defined clusters;
DIANA (split hierarchical clustering)
BIRCH (Balanced Iterative Reduction and Clustering Using Hierarchical Methods)
density clustering method
core
As long as the density of points in an area is greater than a certain threshold value, it is added to a cluster that is similar to it.
Classification
DBSCAN
core
Different from partitioning and hierarchical clustering methods, it defines clusters as the largest set of density-connected points, can divide areas with sufficiently high density into clusters, and can find clusters of arbitrary shapes in "noisy" spatial databases. kind.
definition
ε-neighborhood of an object: The area within a radius ε of a given object.
Core object (core point): If the ε-neighborhood of an object contains at least the minimum number of MinPts objects, the object is called a core object.
Direct density reachability: Given an object set D, if p is within the ε-neighborhood of q, and q is a core object, we say that object p is directly density reachable starting from object q.
Density reachability: If there are core points P2, P3,..., Pn, and the density from P1 to P2 is direct, and the density from P2 to P3 is direct,..., the density from P(n-1) to Pn is direct, and the density from Pn to Q is direct, Then the density from P1 to Q is reachable. Density attainable also has no symmetry.
Density connected: If there is a core point S such that S to P and Q are both density reachable, then P and Q are density connected. Density connection has symmetry. If P and Q are density connected, then Q and P must also be density connected. Two points that are densely connected belong to the same cluster.
Noise: A density-based cluster is the largest set of density-connected objects based on density reachability. Objects that are not included in any cluster are considered "noise".
step
1) If the neighborhood of the point contains more than MinPts points, it is a core point, otherwise the point is temporarily recorded as a noise point 2) Find all objects with density reachable from this point to form a cluster
Features
advantage
Clustering is fast and can effectively handle noise points and discover spatial clusters of arbitrary shapes.
shortcoming
(1) When the amount of data increases, larger memory is required to support I/O consumption, which also consumes a lot of data; (2) When the density of spatial clustering is uneven and the cluster spacing differs greatly, the clustering quality is poor. (3) There are two initial parameters ε (neighborhood radius) and minPts (minimum number of points in ε neighborhood) that require the user to manually set the input, and the clustering results are very sensitive to the values of these two parameters. Different values values will produce different clustering results.
OPTICS
DENCLUE
Bayesian classification
Naive Bayes
Bayes method is a pattern classification method when the prior probability and class conditional probability are known. The classification result of the sample to be divided depends on the total number of samples in various fields.
Naive Bayes assumes that all feature attributes are independent of each other, which is why the word "naive" in the name of the algorithm comes from
In reality, there are often dependencies between attributes, but what is interesting is that even when the independence assumption of the Naive Bayes algorithm is obviously not true, it can still get very good classification results.
Bayesian formula
minimum error rate
Features are information given
Category is the final requirement
When there are multiple feature attributes
meaning
Posterior probability P(cj |x)
That is, the probability that cj is true when given a data sample x, and this is what we are interested in (to be calculated)
Each P(xk|Ci) can be obtained through prior knowledge Or perform statistics through sample sets
Prior probability P(cj)
The prior probability P(Ci) can be obtained through prior knowledge Or perform statistics through sample sets
P(x) can be eliminated or formulated
Simplification
minimal risk
decision table
Calculation method
For each decision α, calculate separately
Take the decision with the least conditional risk
nearest neighbor method
Nearest neighbor method/K nearest neighbor method
Purpose
Determine the classification of a point
Ideas
Find the k training instances closest to the new instance in the training data set, and then count the class with the largest number of classes among the recent k training instances, which is the class of the new instance.
process
Calculate the distance between each sample point in the training sample and the test sample (common distance measures include Euclidean distance, Mahalanobis distance, etc.)
Sort all distance values above
Select the first k samples with the smallest distance
Vote based on the labels of these k samples to get the final classification category
Choice of k value
The smaller the k value, the more complex the model is and the easier it is to overfit. However, the larger the k value, the simpler the model. If k=N, it means that no matter what point, it is the class with the most categories in the training set. Therefore, k will generally take a smaller value and then use cross-validation to determine The so-called cross-validation here is to divide a part of the sample into prediction samples, such as 95% training and 5% prediction, and then k takes 1, 2, 3, 4, 5 and the like respectively to predict and calculate the final classification error. Choose k with the smallest error
the difference
K-Means
The purpose is to divide a series of point sets into k categories
K-Means is a clustering algorithm
Unsupervised learning, grouping similar data together to obtain classification, no external classification
The training data set has no labels and is messy. After clustering, it becomes somewhat ordered. It is disordered at first and then ordered.
Nearest neighbor method/K nearest neighbor method
The purpose is to determine the classification of a point
KNN is a classification algorithm
Supervised learning, the classification target is known in advance
The training data set has labels and is already completely correct data.
Association rules
definition
basic concept
Item: For example, cola, potato chips, bread, beer, and diapers are all called items.
Let I={i1, i2,…,im} be the set of all items (Item).
Transaction T is a purchase record, and each transaction T has a unique identifier, recorded as Tid.
D is the set of all transactions.
Itemset is the set we want to study
The number of items in an itemset is called the length of the itemset, and an itemset containing k items is called a K-itemset.
Association rules
A logical implication of the form A->B, where neither A nor B is empty, and A⸦I, B⸦I, and (A crosses B=empty).
SupportSupport
Describe the probability that itemsets A and B appear simultaneously in all transactions D
S(A->B)=P(AB)=|AB|/|D|
Support is a measure of the importance of association rules
ConfidenceConfidence
In the thing T in which item set A appears, the probability that item set B also appears at the same time.
C(A->B)=P(B|A)=|AB|/|A|
Confidence is a measure of the accuracy of association rules
Strong association rules
The association rules that D satisfies the minimum support and minimum credibility on I are called strong association rules.
Lift
The degree of lift indicates how much influence the appearance of item set A has on the appearance of item set B.
L(A->B)=P(AB)/(P(A)*P(B))
Greater than 1
Positive correlation
equal to 1
Independent
less than 1
negative correlation
frequent itemsets
Item sets that satisfy minimum support are called frequent itemsets. The set of frequent k-itemsets is usually denoted Lk
Purpose
Find strong association rules based on user-specified minimum support and minimum confidence
step
Find all frequent itemsets or the largest frequent itemsets by given the minimum support by the user
Find association rules in frequent item sets by giving minimum credibility by the user
algorithm
Apriori algorithm
The first step is to retrieve all frequent itemsets in the transaction database through iteration, that is, itemsets whose support is not lower than the threshold set by the user;
Frequent items: counting, counting S
The second step uses frequent item sets to construct rules that satisfy the user's minimum trust level.
Association rules: Count C
FP-Growth