Features
Features:

Product Tour >

Edraw AI >

Paid Plans:

Individuals >

Business >

Eduaction >
Resources
Blog

History

How-tos & Tips

Discovery

Biography

Business Analysis

Examples

AI concept Map

Free AI Mind Map Generator

Onenote Mind Map

Bcg Matrix Examples

Nike Marketing Strategy

Unilever SWOT Analysis

Make Mind Maps in Google Docs

Guide

FAQs

What's New

Resource Center
Templates
All Templates

Brain Storming Templates

Strategy and Planning Templates

Project Management Templates

Product Management Templates

Human Resources Templates

Agile Workflow Templates

Marketing Templates

Education Templates

Fun and Games Templates

User Gallery
Download
Pricing
Enterprise

EdrawMind

pattern recognition

Also called machine learning or data mining. It mainly includes introduction, data preprocessing, cluster analysis, Bayesian classification, nearest neighbor method, etc.

Edited at 2024-02-04 00:51:57

PlotWizard

Recent works View more works>>

pattern recognition

PlotWizard

Recent works View more works>>

Recommended to you
Outline

pattern recognition

introduction

Basic concepts of pattern recognition

pattern recognition

Using computers to realize people's pattern recognition ability is a technology that uses computers to realize people's analysis, description, judgment, and identification of various things or phenomena, and assigns the things to be recognized to various pattern categories.

Pattern recognition can be viewed as the mapping from patterns to categories

model

Information about a substance or phenomenon

Broadly speaking, observable objects that exist in time and space can be called patterns if they can be distinguished as being the same or similar.

A pattern is a description of an object formed through information collection. This description should be standardized, understandable, and identifiable.

illustrate

A pattern is not the thing itself, but the information obtained from the thing. For example, people’s photos and personal information

Can distinguish whether patterns are similar (relevant to the question)

Patterns are generally represented by vectors, and subscripts can reflect time characteristics, spatial characteristics, or other identifiers.

pattern vector

Information with time and space distribution obtained by observing specific individual things (referred to as samples or sample vectors)

Pattern class

The category to which a pattern belongs or the population of patterns in the same category (category for short)

pattern recognition system

Consists of two processes: design and implementation

The category to which a pattern belongs or the population of patterns in the same category (category for short)

Design (training, learning)

Refers to using a certain number of samples (called training set or learning set) to design a classifier

Realization (decision-making, classification, judgment)

Refers to using the designed classifier to make classification decisions for the samples to be identified.

System composition

Data collection (data acquisition)

Way

Through various sensors, information such as light or sound is converted into electrical information, or text information is input into the computer

Classification

One-dimensional waveforms: sound waves, electrocardiogram, electroencephalogram, etc.

Two-dimensional images: text, images, etc.

3D images: faces, etc.

Physical quantities: person’s height, weight, product weight, quality level, etc.

Logical quantity (0/1): presence or absence, male and female, etc.

preprocessing

Purpose

Remove noise and enhance useful information

Commonly used techniques

One-dimensional signal filtering and denoising, image smoothing, enhancement, restoration, filtering, etc.

Feature extraction and selection

Purpose

From the original data, obtain the features that best reflect the nature of the classification

Feature formation

Several features reflecting classification problems are obtained from the original data through various means (sometimes data standardization is required)

Feature selection

Select several features that are most beneficial to classification from the features

Feature extraction

Reduce the number of features through certain mathematical transformations

Classification decision or model matching

Use decision rules in the feature space to assign the recognized object to a certain category

illustrate

This system structure is suitable for statistical pattern recognition, fuzzy pattern recognition, and supervised methods in artificial neural networks.

For structural pattern recognition methods, only primitive extraction is used to replace feature extraction and selection.

For cluster analysis, classifier design and decision-making are integrated into one step.

Image features

color

texture

shape

Spatial Relations

four spaces

Three major tasks

Pattern collection

Feature extraction and feature selection

Type discrimination

Related questions

Performance evaluation

Test error rate or error rate

computational complexity

divide

Classification basis

Question or sample nature

Supervised pattern recognition

First have a batch of samples with category labels, design a classifier based on the sample set, and then determine the new sample category

Unsupervised pattern recognition

There is only one batch of samples, and the sample set is directly divided into several categories based on the similarities between the samples.

main method

statistical pattern recognition

Classification

unsupervised classification

Cluster analysis

Supervised classification

Collection classification

Probabilistic classification

Describe method

Feature vector

Mode determination

Expressed by conditional probability distribution P (X/i), there are m distributions in m categories, and then determine which distribution the unknown pattern belongs to.

Theoretical basis

probability theory

mathematical statistics

advantage

More mature

Able to consider the impact of interfering noise

Strong ability to recognize pattern primitives

shortcoming

It is difficult to extract features from patterns with complex structures

It cannot reflect the structural characteristics of the pattern, and it is difficult to describe the nature of the pattern.

Difficulty considering identification issues from a holistic perspective

Structural pattern recognition

fuzzy pattern recognition

neural network method

Theoretical basis

Neurophysiology

psychology

Pattern description method

A set of input nodes represented by different levels of activity

Mode determination

nonlinear dynamic system

main method

BP model, HOPField model

advantage

Effectively solve complex nonlinear problems

Allow samples to have larger defects and distortions

shortcoming

Lack of effective learning theory

long time

Application areas

Images, faces, text, numbers, fingerprints, voices...

fundamental issue

Pattern (sample) representation method

n-dimensional column vector

x= (x1, x2, …, xn)T

Compactness of pattern classes

critical point (sample)

In a multi-category sample set, when the characteristic values of some samples change slightly, they become another category of samples. Such samples are called critical samples (points)

firm set

definition

The distribution of samples of the same pattern class is relatively concentrated, with no or very few critical samples. Such pattern classes are called compact sets.

nature

Very few critical points

A line connecting any two points in a set. The points on the line belong to the same set.

Each point in the set has a large enough neighborhood, and the neighborhood only contains points from the same set.

Require

satisfies tightness

similarity

Express similarity using various distances

Common distance

Minkowski distance

Absolute value distance or urban distance or Manhattan distance (q=1)

Euclidean distance (q=2)

Checkerboard distance or Chebyshev distance (q=∞)

Mahalanobis distance

where the covariance matrix and mean are

Standardization of data

Purpose

Eliminate the impact of the numerical range between each component on the algorithm

method

Standardize to [0,1] or [-1, 1], variance standardization

formula

Feature normalization

Variance normalization

Data preprocessing

Why do data preprocessing?

not good

incomplete

Lack of appropriate values during data collection

Different considerations during data collection and data analysis

Human/hardware/software issues

noisy

Problems with data collection tools

Human/computer error during data entry

Errors in data transmission

Inconsistent data types

different data sources

functional dependency violated

good

Correctness: such as whether it is correct, accurate or not, etc.

Completeness: if any data is missing or cannot be obtained

Consistency: if some data has been modified but others have not

Reliability: Describes the degree of confidence that the data is correct

Task

Data cleaning

Fill in missing values, smooth noisy data, identify and remove outliers, and resolve inconsistencies

data integration

Integrate multiple databases, data cubes or files

Data transformation and discretization

Standardize

Concept hierarchical generation

data reduction

Dimension reduction

Quantity reduction

data compression

Feature extraction and feature selection

Data cleaning

❑ Fill in missing values

reason

❑ Equipment abnormality

❑ Deleted due to inconsistency with other existing data

❑ Data that was not entered due to misunderstanding

❑ Some data was not entered because it was not taken seriously during input.

❑ No logging of data changes

deal with

◼ Ignore tuples: This is usually done when the class label is missing (assuming the mining task is designed to classify or describe), when the percentage of missing values for each attribute changes (the task is designed to classify or describe), when the percentage of missing values for each attribute varies greatly , its effect is very poor.

"Class Label" (Class Label or Target Label) usually refers to "the label used to represent the class or group to which the sample belongs" in the data set.

◼ Manually filling in missing values: heavy workload and low feasibility

◼ Automatically fill in missing values

❑ Use a global variable: such as unknown or -∞

❑ Use attribute averages

❑ Use the mean or median of all samples belonging to the same class as the given tuple

❑ Fill in the missing values with the most likely values: using inference-based methods like Bayesian formula or decision trees

❑ Smooth noise data

reason

❑ Issues with data collection tools

❑ Data entry errors

❑ Data transmission error

❑ Technical limitations

❑ Inconsistency in naming rules

deal with

binning

First sort the data and divide them into equal-depth bins. Then you can smooth by the mean of the bin, smooth by the median of the bin, smooth by the boundary of the bin, etc.

operate

Equal depth binning

Boundary value smoothing: turn all values into maximum or minimum values

Equal width binning

[110,155), left closed and right open

clustering

Detect and remove outliers through clustering

return

Smooth data by fitting it to a regression function

❑ Identify or delete outliers

❑ Resolve inconsistencies in data

data integration

◼ Data integration:

❑ Consolidate data from multiple data sources into a consistent store

◼ Pattern integration:

❑ Integrate metadata from different data sources

◼ e.g. A.cust_id = B.customer_no

◼ Entity recognition issues:

❑ Match real-world entities from different data sources

◼ e.g. Bill Clinton = William Clinton

◼ Detect and resolve data value conflicts

❑ For the same entity in the real world, attribute values from different data sources may be different

❑ Possible reasons: different data representation, different measurements, etc.

data reduction

Purpose

◆Complex data analysis of large-scale database content often takes a lot of time, making original data analysis unrealistic and unfeasible;

◆Data reduction: Data reduction or reduction is to reduce the size of the mined data without affecting the final mining results.

◆Data reduction techniques can be used to obtain a reduced representation of the data set, which is much smaller but still close to maintaining the integrity of the original data.

◆Mining the reduced data set can increase the efficiency of mining and produce the same (or almost the same) results.

standard

◆The time spent on data reduction should not exceed or "offset" the time saved in mining the reduced data set.

◆The data obtained by reduction is much smaller than the original data, but can produce the same or almost the same analysis results.

method

◆Data cube aggregation;

Aggregate n-dimensional data cubes into n-1-dimensional data cubes.

◆Dimension reduction (attribute reduction);

Find the minimum set of attributes to ensure that the probability distribution of the new data set is as close as possible to the probability distribution of the original data set.

PCA

◆Data compression;

lossless compression

Lossy compression

◆Numerical reduction;

Reduce data volume by choosing alternative, smaller data representations.

type

Histogram

clustering

sampling

◆Discretization and hierarchical generation of concepts.

Standardize

min-max normalization

It must be correct

z-score normalization (zero-mean normalization)

May be negative

discretization

Purpose

Data discretization is the process of dividing the values of continuous data into several intervals to simplify the complexity of the original data set.

type

Values in an unordered set; e.g. color, occupation

Values in an ordered set; e.g. military rank, professional title

Continuous values; e.g. real numbers

concept layering

Cluster analysis

concept

Thought

Classify each classified model based on a certain similarity measure.

Group similar ones into one category

algorithm

Simple clustering method based on similarity threshold and minimum distance principle

A method of continuously merging two categories according to the minimum distance principle

Dynamic clustering method based on criterion function

application

Cluster analysis can be used as a preprocessing step for other algorithms

Can be used as an independent tool to obtain the distribution of data

Cluster analysis can complete isolated point mining

Partition-based clustering methods

The partitioning method is to divide data objects into non-overlapping subsets (clusters) so that each data object is in exactly one subset.

Classification

distance type

Euclidean distance

manhattan distance

Minkowski distance

Min's distance is not a distance, but a definition of a set of distances.

Algorithm type

k-means (K-means) algorithm

Input: the number of clusters k and the database D containing n objects

Output: k clusters that minimize the squared error criterion.

Algorithm steps

1. Determine an initial cluster center for each cluster, so that there are K initial cluster centers. 2. The samples in the sample set are assigned to the nearest neighbor clusters according to the minimum distance principle. 3. Use the sample mean in each cluster as the new cluster center. 4． Repeat steps 2 and 3 until the cluster center no longer changes. 5. At the end, K clusters are obtained.

Features

advantage

Simple and fast

Scalable and efficient

The effect is better when the result set is dense

shortcoming

Can only be used if the cluster mean is defined

k must be given in advance

It is very sensitive to the initial value and directly affects the number of iterations.

Not suitable for finding clusters with non-convex shapes or clusters with widely varying sizes.

Is sensitive to "noise" and outlier data

Improve

k-mode algorithm: realizes fast clustering of discrete data, retains the efficiency of k-means algorithm and expands the application scope of k-means to discrete data.

k-prototype algorithm: It can cluster data that is a mixture of discrete and numerical attributes. In k-prototype, a dissimilarity metric is defined that calculates both numerical and discrete attributes.

k-Mediods algorithm (K-Mediods): The k-means algorithm is sensitive to isolated points. In order to solve this problem, instead of using the average value in the cluster as the reference point, you can choose the most central object in the cluster, that is, the center point as the reference point. This division method is still based on the principle of minimizing the sum of dissimilarities between all objects and their reference points.

k-medoids (K-center point) algorithm

Input: the number of clusters k and a database containing n objects.

Output: k clusters

Algorithm steps

1. Determine an initial clustering center for each cluster, so that there are k initial clustering centers. 2. Calculate the distances from all other points to the k center points, and regard the shortest cluster from each point to the k center points as the cluster to which it belongs. 3. Select points in order in each cluster, calculate the sum of distances from this point to all points in the current cluster, and the point with the smallest final distance sum is regarded as the new center point. 4. Repeat steps 2 and 3 until the center points of each cluster no longer change. 5. End, k clusters are obtained.

Features

advantage

The K-medoids algorithm calculates the point with the smallest sum of distances from a certain point to all other points. The influence of some isolated data on the clustering process can be reduced by calculating the smallest sum of distances. This makes the final effect closer to the real division.

shortcoming

Compared with the K-means algorithm, it will increase the calculation amount by about O(n), so in general, the K-medoids algorithm is more suitable for small-scale data operations.

Hierarchical based clustering algorithm

definition

Create a clustered tree of data objects. Depending on whether the hierarchical decomposition is formed from bottom-up or top-down, it can be further divided into agglomerative hierarchical clustering and divisive hierarchical clustering.

core

How to measure the distance between two clusters, where each cluster is generally a set of objects.

Classification

Distance type (inter-cluster distance measurement method)

Algorithm type

AGNES (agglomerative hierarchical clustering)

definition

AGNES (agglomerative hierarchical clustering) is a bottom-up strategy that first treats each object as a cluster and then merges these atomic clusters into larger and larger clusters until a certain terminal condition is met.

Similarity

The similarity between two clusters is determined by the similarity of the closest pairs of data points in the two different clusters.

step

1. Treat each object as an initial cluster; 2. REPEAT; 3. Find the two nearest clusters based on the nearest data points in the two clusters; 4. Merge two clusters to generate a new cluster set; 5. UNTIL reaches the number of defined clusters;

DIANA (split hierarchical clustering)

BIRCH (Balanced Iterative Reduction and Clustering Using Hierarchical Methods)

density clustering method

core

As long as the density of points in an area is greater than a certain threshold value, it is added to a cluster that is similar to it.

Classification

DBSCAN

core

Different from partitioning and hierarchical clustering methods, it defines clusters as the largest set of density-connected points, can divide areas with sufficiently high density into clusters, and can find clusters of arbitrary shapes in "noisy" spatial databases. kind.

definition

ε-neighborhood of an object: The area within a radius ε of a given object.

Core object (core point): If the ε-neighborhood of an object contains at least the minimum number of MinPts objects, the object is called a core object.

Direct density reachability: Given an object set D, if p is within the ε-neighborhood of q, and q is a core object, we say that object p is directly density reachable starting from object q.

Density reachability: If there are core points P2, P3,..., Pn, and the density from P1 to P2 is direct, and the density from P2 to P3 is direct,..., the density from P(n-1) to Pn is direct, and the density from Pn to Q is direct, Then the density from P1 to Q is reachable. Density attainable also has no symmetry.

Density connected: If there is a core point S such that S to P and Q are both density reachable, then P and Q are density connected. Density connection has symmetry. If P and Q are density connected, then Q and P must also be density connected. Two points that are densely connected belong to the same cluster.

Noise: A density-based cluster is the largest set of density-connected objects based on density reachability. Objects that are not included in any cluster are considered "noise".

step

1) If the neighborhood of the point contains more than MinPts points, it is a core point, otherwise the point is temporarily recorded as a noise point 2) Find all objects with density reachable from this point to form a cluster

Features

advantage

Clustering is fast and can effectively handle noise points and discover spatial clusters of arbitrary shapes.

shortcoming

(1) When the amount of data increases, larger memory is required to support I/O consumption, which also consumes a lot of data; (2) When the density of spatial clustering is uneven and the cluster spacing differs greatly, the clustering quality is poor. (3) There are two initial parameters ε (neighborhood radius) and minPts (minimum number of points in ε neighborhood) that require the user to manually set the input, and the clustering results are very sensitive to the values of these two parameters. Different values values will produce different clustering results.

OPTICS

DENCLUE

Bayesian classification

Naive Bayes

Bayes method is a pattern classification method when the prior probability and class conditional probability are known. The classification result of the sample to be divided depends on the total number of samples in various fields.

Naive Bayes assumes that all feature attributes are independent of each other, which is why the word "naive" in the name of the algorithm comes from

In reality, there are often dependencies between attributes, but what is interesting is that even when the independence assumption of the Naive Bayes algorithm is obviously not true, it can still get very good classification results.

Bayesian formula

minimum error rate

Features are information given

Category is the final requirement

When there are multiple feature attributes

meaning

Posterior probability P(cj |x)

That is, the probability that cj is true when given a data sample x, and this is what we are interested in (to be calculated)

Each P(xk|Ci) can be obtained through prior knowledge Or perform statistics through sample sets

Prior probability P(cj)

The prior probability P(Ci) can be obtained through prior knowledge Or perform statistics through sample sets

P(x) can be eliminated or formulated

Simplification

minimal risk

decision table

Calculation method

For each decision α, calculate separately

Take the decision with the least conditional risk

nearest neighbor method

Nearest neighbor method/K nearest neighbor method

Purpose

Determine the classification of a point

Ideas

Find the k training instances closest to the new instance in the training data set, and then count the class with the largest number of classes among the recent k training instances, which is the class of the new instance.

process

Calculate the distance between each sample point in the training sample and the test sample (common distance measures include Euclidean distance, Mahalanobis distance, etc.)

Sort all distance values above

Select the first k samples with the smallest distance

Vote based on the labels of these k samples to get the final classification category

Choice of k value

The smaller the k value, the more complex the model is and the easier it is to overfit. However, the larger the k value, the simpler the model. If k=N, it means that no matter what point, it is the class with the most categories in the training set. Therefore, k will generally take a smaller value and then use cross-validation to determine The so-called cross-validation here is to divide a part of the sample into prediction samples, such as 95% training and 5% prediction, and then k takes 1, 2, 3, 4, 5 and the like respectively to predict and calculate the final classification error. Choose k with the smallest error

the difference

K-Means

The purpose is to divide a series of point sets into k categories

K-Means is a clustering algorithm

Unsupervised learning, grouping similar data together to obtain classification, no external classification

The training data set has no labels and is messy. After clustering, it becomes somewhat ordered. It is disordered at first and then ordered.

Nearest neighbor method/K nearest neighbor method

The purpose is to determine the classification of a point

KNN is a classification algorithm

Supervised learning, the classification target is known in advance

The training data set has labels and is already completely correct data.

Association rules

definition

basic concept

Item: For example, cola, potato chips, bread, beer, and diapers are all called items.

Let I={i1, i2,…,im} be the set of all items (Item).

Transaction T is a purchase record, and each transaction T has a unique identifier, recorded as Tid.

D is the set of all transactions.

Itemset is the set we want to study

The number of items in an itemset is called the length of the itemset, and an itemset containing k items is called a K-itemset.

Association rules

A logical implication of the form A->B, where neither A nor B is empty, and A⸦I, B⸦I, and (A crosses B=empty).

SupportSupport

Describe the probability that itemsets A and B appear simultaneously in all transactions D

S(A->B)=P(AB)=|AB|/|D|

Support is a measure of the importance of association rules

ConfidenceConfidence

In the thing T in which item set A appears, the probability that item set B also appears at the same time.

C(A->B)=P(B|A)=|AB|/|A|

Confidence is a measure of the accuracy of association rules

Strong association rules

The association rules that D satisfies the minimum support and minimum credibility on I are called strong association rules.

Lift

The degree of lift indicates how much influence the appearance of item set A has on the appearance of item set B.

L(A->B)=P(AB)/(P(A)*P(B))

Greater than 1

Positive correlation

equal to 1

Independent

less than 1

negative correlation

frequent itemsets

Item sets that satisfy minimum support are called frequent itemsets. The set of frequent k-itemsets is usually denoted Lk

Purpose

Find strong association rules based on user-specified minimum support and minimum confidence

step

Find all frequent itemsets or the largest frequent itemsets by given the minimum support by the user

Find association rules in frequent item sets by giving minimum credibility by the user

algorithm

Apriori algorithm

The first step is to retrieve all frequent itemsets in the transaction database through iteration, that is, itemsets whose support is not lower than the threshold set by the user;

Frequent items: counting, counting S

The second step uses frequent item sets to construct rules that satisfy the user's minimum trust level.

Association rules: Count C

FP-Growth