MindMap Gallery The foundation of neural networks and deep learning
It summarizes the most basic neural network structures - multi-layer perceptron MLP and feedforward network FNN. On this basis, it summarizes the objective function and optimization technology of neural network. The back propagation algorithm calculates the gradient problem of the objective function to the network weight coefficient. , as well as auxiliary technologies for neural network optimization such as initialization, regularization, etc.
Edited at 2023-02-23 17:40:31One Hundred Years of Solitude is the masterpiece of Gabriel Garcia Marquez. Reading this book begins with making sense of the characters' relationships, which are centered on the Buendía family and tells the story of the family's prosperity and decline, internal relationships and political struggles, self-mixing and rebirth over the course of a hundred years.
One Hundred Years of Solitude is the masterpiece of Gabriel Garcia Marquez. Reading this book begins with making sense of the characters' relationships, which are centered on the Buendía family and tells the story of the family's prosperity and decline, internal relationships and political struggles, self-mixing and rebirth over the course of a hundred years.
Project management is the process of applying specialized knowledge, skills, tools, and methods to project activities so that the project can achieve or exceed the set needs and expectations within the constraints of limited resources. This diagram provides a comprehensive overview of the 8 components of the project management process and can be used as a generic template for direct application.
One Hundred Years of Solitude is the masterpiece of Gabriel Garcia Marquez. Reading this book begins with making sense of the characters' relationships, which are centered on the Buendía family and tells the story of the family's prosperity and decline, internal relationships and political struggles, self-mixing and rebirth over the course of a hundred years.
One Hundred Years of Solitude is the masterpiece of Gabriel Garcia Marquez. Reading this book begins with making sense of the characters' relationships, which are centered on the Buendía family and tells the story of the family's prosperity and decline, internal relationships and political struggles, self-mixing and rebirth over the course of a hundred years.
Project management is the process of applying specialized knowledge, skills, tools, and methods to project activities so that the project can achieve or exceed the set needs and expectations within the constraints of limited resources. This diagram provides a comprehensive overview of the 8 components of the project management process and can be used as a generic template for direct application.
Neural Networks and Deep Learning Base
Basic structure of neural network
neuron structure
weighted sum
stimulus signal
synaptic/weighted
activation value
activation function
discontinuous function
symbolic function
perceptron
threshold function
McCulloch-Pitts neurons
continuously differentiable function
Logistic Sigmoid function
Hyperbolic tangent function tanh()
shortcoming
When the activation value a is large, the function enters the saturation region and the corresponding derivative is close to 0. In the learning algorithm through gradient, the convergence becomes very slow or even stagnant. The ReLU function converges faster
ReLU function
Classic ReLU
Leaky ReLU
Summary
The computational structure of neurons
Linear weighted summation produces activation values Non-linear activation function produces output
Multi-layer neural network solves XOR problem
perceptron
Linear combination symbolic activation function
Linear inseparability does not converge
Such as XOR operation
Linearly inseparable solution
The nonlinear basis function vector replaces the original eigenvector.
Use multiple neurons to form a multi-layer neural network
How neurons are connected
As a basic building block, neurons are connected into a multi-layer network through parallel and cascade structures.
Parallel connection
Multiple neurons in the same layer receive the same input feature vector x and produce multiple outputs respectively.
Cascade mode
Multiple neurons connected in parallel each produce outputs, which are passed to the neurons in the next layer as input.
Multilayer Perceptron MLP Feedforward neural network FNN
Multilayer perceptron structure
input layer
The number of units in the input layer is the dimension D of the input feature vector.
Input feature matrix N×D
Each row corresponds to a sample, and the number of rows is the number of samples N
The number of columns is the feature vector dimension D
Hidden layer
Tier 1
Input matrix N×D
is the original feature matrix
Weight coefficient matrix D×K1
The weight coefficient of each neuron corresponds to a D-dimensional column vector
A total of K1 neurons form a D×K1 matrix.
Bias vector N×K1
Each row corresponds to a sample bias, a total of N rows
The number of columns is the number of neurons K1
Output matrix N×K1
Z=φ(A)=φ(XW W0)
Tier 2
Input matrix N×K1
Upper layer output matrix
Weight coefficient matrix K1×K2
The weight coefficient of each neuron corresponds to a K1-dimensional column vector
A total of K2 neurons form a matrix of K1×K2
Bias vector N×K2
Each row corresponds to a sample bias, a total of N rows
The number of columns is the number of neurons K2
Output matrix N×K2
Z=φ(A)=φ(XW W0)
mth layer
Input matrix N×K(m-1)
Upper layer output matrix
Weight coefficient matrix K(m-1)×Km
The weight coefficient of each neuron corresponds to a K(m-1)-dimensional column vector
A total of Km neurons form a matrix of K(m-1)×Km
Bias vector N×Km
Each row corresponds to a sample bias, a total of N rows
The number of columns is the number of neurons Km
Output matrix N×Km
Z=φ(A)=φ(XW W0)
output layer
Input matrix N×K(L-1)
Upper layer output matrix
Weight coefficient matrix K(L-1)×KL
The weight coefficient of each neuron corresponds to a K(L-1)-dimensional column vector
A total of KL neurons form a matrix of K(L-1)×KL
Bias vector N×KL
Each row corresponds to a sample bias, a total of N rows
The number of columns is the number of neurons KL
Output matrix N×KL
Z=φ(A)=φ(XW W0)
The operational relationship of multi-layer perceptron Program structure
enter
The output of the j-th neuron in the m-th layer
weighted sum
The output of the upper layer is used as the input of this layer
activation function
output
Neural network output representation
Note
The number of neurons in the output layer indicates that the neural network can have multiple output functions at the same time.
regression problem
The output of the output layer neuron is the regression function output.
Two categories
The output layer neuron outputs the posterior probability of the positive type, and the Sigmoid function represents the posterior probability of the type.
Multiple categories
Each neuron in the output layer outputs the posterior probability of each type, and the Softmax function represents the probability of each type.
Neural network nonlinear mapping
The difference from basis function regression
Determination of parameters
The basis functions for basis function regression are predetermined
The basis function parameters of the neural network are part of the system parameters and need to be determined through training.
non-linear relationship
Basis function regression only has a nonlinear relationship between the input vector and the output.
The input vector and weight coefficient of the neural network have a non-linear relationship with the output
Example
Two-layer neural network
three-layer neural network
Approximation theorem of neural network
Neural network essence
Mapping from D-dimensional Euclidean space to K-dimensional Euclidean space
The input feature vector x is a D-dimensional vector
The output y is a K-dimensional vector
content
An MLP that only needs one layer of hidden units can approximate a continuous function defined in a finite interval with arbitrary accuracy.
Objective functions and optimization of neural networks
Neural network objective function
generally
Multiple regression output situations
error sum of squares
Multiple binary classification output situations
cross entropy
Single K classification output situation
cross entropy
The derivative of the sample loss function with respect to the output activation
Optimization of Neural Networks
loss function
Highly non-linear non-convex functions
The solution to minimize the loss function satisfies
Hansen matrix H satisfies positive definiteness
Neural network weight coefficient
Dimensions
Symmetry of weight coefficient space
The input-output relationship remains unchanged when neurons exchange positions, and the neural network is equivalent before and after.
Weight coefficient optimization
full gradient algorithm
stochastic gradient algorithm
mini-batch stochastic gradient algorithm
Backpropagation BP algorithm calculates gradients or derivatives
Error back propagation BP algorithm Calculate the gradient of the weight coefficient of the loss function
Thought
chain rule of derivatives
The derivative of the loss function to the output activation is the error of the regression output to the label
The derivative of the activation weight coefficient is the input vector
Loss function gradient or derivative of weight coefficient
error back propagation
There is a lack of error in the hidden layer, and the impact of the error needs to be propagated from the output layer to the input direction.
Derivation of backpropagation algorithm
forward propagation
initial value
Hidden layer
output layer
Output layer gradient
Output layer error
gradient component
Hidden layer backpropagation
Hidden layer gradient chain decomposition
Formula Derivation
Algorithmic thinking
forward propagation
The neuron output z of the previous layer is weighted and summed to obtain the neuron activation a of the next layer.
Backpropagation
The propagation error of the latter layer (layer close to the output) δ(l 1) is back-propagated to the previous layer to obtain the propagation error δ(l) of the previous layer, which is back-propagated to the first hidden layer (closest to input hidden layer)
algorithm process (One-step iteration of weight coefficient)
initial value
forward propagation
Hidden layer
output layer
Backpropagation
output layer
Hidden layer
gradient component
mini-batch stochastic gradient algorithm
Vector form of backpropagation algorithm
initial value
forward propagation
Augmented weight coefficient for activation of j-th neuron in layer l
The weight coefficient matrix of the lth layer
weighted summation and activation
Output layer propagation error vector
Backpropagation
error back propagation
gradient component
The gradient of the weight vector matrix of the lth layer
The gradient of the bias vector of the lth layer
The gradient of the weight coefficient of a neuron in layer l
An extension of the backpropagation algorithm
Jacobian matrix of network
Jacobian matrix decomposition
Error back propagation equation
regression problem
Two classification problem
Multi-classification problem
Hansen Matrix for Networks
Some problems in neural network learning
fundamental issue
Objective function and gradient calculation
initialization
Weight coefficient initialization
The input and output numbers are m and n respectively.
Xavier initialization
Initialization of weight coefficient when activation function is ReLU function
Input vector normalization
Unit normalization, represented in a unified space
Regularization
Regularized loss function for weight decay
iterative update
Several types of equivalent regularization techniques
augmented sample set
Rotate and translate a sample in the sample set at several different small angles to form a new sample
Inject noise into input vector
Add low-power random noise to the input samples for adversarial training
early stopping technique
Detect the turning point of the verification error. Stop the iteration when the verification error starts to increase to prevent overfitting.