Features
Features:

Product Tour >

Edraw AI >

Paid Plans:

Individuals >

Business >

Eduaction >
Resources
Blog

History

How-tos & Tips

Discovery

Biography

Business Analysis

Examples

AI concept Map

Free AI Mind Map Generator

Onenote Mind Map

Bcg Matrix Examples

Nike Marketing Strategy

Unilever SWOT Analysis

Make Mind Maps in Google Docs

Guide

FAQs

What's New

Resource Center
Templates
All Templates

Brain Storming Templates

Strategy and Planning Templates

Project Management Templates

Product Management Templates

Human Resources Templates

Agile Workflow Templates

Marketing Templates

Education Templates

Fun and Games Templates

User Gallery
Download
Pricing
Enterprise

EdrawMind

The foundation of neural networks and deep learning

MindMap Gallery The foundation of neural networks and deep learning

The foundation of neural networks and deep learning

It summarizes the most basic neural network structures - multi-layer perceptron MLP and feedforward network FNN. On this basis, it summarizes the objective function and optimization technology of neural network. The back propagation algorithm calculates the gradient problem of the objective function to the network weight coefficient. , as well as auxiliary technologies for neural network optimization such as initialization, regularization, etc.

Edited at 2023-02-23 17:40:31

PlotWizard

Recent works View more works>>

The foundation of neural networks and deep learning

PlotWizard

Recent works View more works>>

Recommended to you
Outline

Neural Networks and Deep Learning Base

Basic structure of neural network

neuron structure

weighted sum

stimulus signal

synaptic/weighted

activation value

activation function

discontinuous function

symbolic function

perceptron

threshold function

McCulloch-Pitts neurons

continuously differentiable function

Logistic Sigmoid function

Hyperbolic tangent function tanh()

shortcoming

When the activation value a is large, the function enters the saturation region and the corresponding derivative is close to 0. In the learning algorithm through gradient, the convergence becomes very slow or even stagnant. The ReLU function converges faster

ReLU function

Classic ReLU

Leaky ReLU

Summary

The computational structure of neurons

Linear weighted summation produces activation values Non-linear activation function produces output

Multi-layer neural network solves XOR problem

perceptron

Linear combination symbolic activation function

Linear inseparability does not converge

Such as XOR operation

Linearly inseparable solution

The nonlinear basis function vector replaces the original eigenvector.

Use multiple neurons to form a multi-layer neural network

How neurons are connected

As a basic building block, neurons are connected into a multi-layer network through parallel and cascade structures.

Parallel connection

Multiple neurons in the same layer receive the same input feature vector x and produce multiple outputs respectively.

Cascade mode

Multiple neurons connected in parallel each produce outputs, which are passed to the neurons in the next layer as input.

Multilayer Perceptron MLP Feedforward neural network FNN

Multilayer perceptron structure

input layer

The number of units in the input layer is the dimension D of the input feature vector.

Input feature matrix N×D

Each row corresponds to a sample, and the number of rows is the number of samples N

The number of columns is the feature vector dimension D

Hidden layer

Tier 1

Input matrix N×D

is the original feature matrix

Weight coefficient matrix D×K1

The weight coefficient of each neuron corresponds to a D-dimensional column vector

A total of K1 neurons form a D×K1 matrix.

Bias vector N×K1

Each row corresponds to a sample bias, a total of N rows

The number of columns is the number of neurons K1

Output matrix N×K1

Z=φ(A)=φ(XW W0)

Tier 2

Input matrix N×K1

Upper layer output matrix

Weight coefficient matrix K1×K2

The weight coefficient of each neuron corresponds to a K1-dimensional column vector

A total of K2 neurons form a matrix of K1×K2

Bias vector N×K2

Each row corresponds to a sample bias, a total of N rows

The number of columns is the number of neurons K2

Output matrix N×K2

Z=φ(A)=φ(XW W0)

mth layer

Input matrix N×K(m-1)

Upper layer output matrix

Weight coefficient matrix K(m-1)×Km

The weight coefficient of each neuron corresponds to a K(m-1)-dimensional column vector

A total of Km neurons form a matrix of K(m-1)×Km

Bias vector N×Km

Each row corresponds to a sample bias, a total of N rows

The number of columns is the number of neurons Km

Output matrix N×Km

Z=φ(A)=φ(XW W0)

output layer

Input matrix N×K(L-1)

Upper layer output matrix

Weight coefficient matrix K(L-1)×KL

The weight coefficient of each neuron corresponds to a K(L-1)-dimensional column vector

A total of KL neurons form a matrix of K(L-1)×KL

Bias vector N×KL

Each row corresponds to a sample bias, a total of N rows

The number of columns is the number of neurons KL

Output matrix N×KL

Z=φ(A)=φ(XW W0)

The operational relationship of multi-layer perceptron Program structure

enter

The output of the j-th neuron in the m-th layer

weighted sum

The output of the upper layer is used as the input of this layer

activation function

output

Neural network output representation

Note

The number of neurons in the output layer indicates that the neural network can have multiple output functions at the same time.

regression problem

The output of the output layer neuron is the regression function output.

Two categories

The output layer neuron outputs the posterior probability of the positive type, and the Sigmoid function represents the posterior probability of the type.

Multiple categories

Each neuron in the output layer outputs the posterior probability of each type, and the Softmax function represents the probability of each type.

Neural network nonlinear mapping

The difference from basis function regression

Determination of parameters

The basis functions for basis function regression are predetermined

The basis function parameters of the neural network are part of the system parameters and need to be determined through training.

non-linear relationship

Basis function regression only has a nonlinear relationship between the input vector and the output.

The input vector and weight coefficient of the neural network have a non-linear relationship with the output

Example

Two-layer neural network

three-layer neural network

Approximation theorem of neural network

Neural network essence

Mapping from D-dimensional Euclidean space to K-dimensional Euclidean space

The input feature vector x is a D-dimensional vector

The output y is a K-dimensional vector

content

An MLP that only needs one layer of hidden units can approximate a continuous function defined in a finite interval with arbitrary accuracy.

Objective functions and optimization of neural networks

Neural network objective function

generally

Multiple regression output situations

error sum of squares

Multiple binary classification output situations

cross entropy

Single K classification output situation

cross entropy

The derivative of the sample loss function with respect to the output activation

Optimization of Neural Networks

loss function

Highly non-linear non-convex functions

The solution to minimize the loss function satisfies

Hansen matrix H satisfies positive definiteness

Neural network weight coefficient

Dimensions

Symmetry of weight coefficient space

The input-output relationship remains unchanged when neurons exchange positions, and the neural network is equivalent before and after.

Weight coefficient optimization

full gradient algorithm

stochastic gradient algorithm

mini-batch stochastic gradient algorithm

Backpropagation BP algorithm calculates gradients or derivatives

Error back propagation BP algorithm Calculate the gradient of the weight coefficient of the loss function

Thought

chain rule of derivatives

The derivative of the loss function to the output activation is the error of the regression output to the label

The derivative of the activation weight coefficient is the input vector

Loss function gradient or derivative of weight coefficient

error back propagation

There is a lack of error in the hidden layer, and the impact of the error needs to be propagated from the output layer to the input direction.

Derivation of backpropagation algorithm

forward propagation

initial value

Hidden layer

output layer

Output layer gradient

Output layer error

gradient component

Hidden layer backpropagation

Hidden layer gradient chain decomposition

Formula Derivation

Algorithmic thinking

forward propagation

The neuron output z of the previous layer is weighted and summed to obtain the neuron activation a of the next layer.

Backpropagation

The propagation error of the latter layer (layer close to the output) δ(l 1) is back-propagated to the previous layer to obtain the propagation error δ(l) of the previous layer, which is back-propagated to the first hidden layer (closest to input hidden layer)

algorithm process (One-step iteration of weight coefficient)

initial value

forward propagation

Hidden layer

output layer

Backpropagation

output layer

Hidden layer

gradient component

mini-batch stochastic gradient algorithm

Vector form of backpropagation algorithm

initial value

forward propagation

Augmented weight coefficient for activation of j-th neuron in layer l

The weight coefficient matrix of the lth layer

weighted summation and activation

Output layer propagation error vector

Backpropagation

error back propagation

gradient component

The gradient of the weight vector matrix of the lth layer

The gradient of the bias vector of the lth layer

The gradient of the weight coefficient of a neuron in layer l

An extension of the backpropagation algorithm

Jacobian matrix of network

Jacobian matrix decomposition

Error back propagation equation

regression problem

Two classification problem

Multi-classification problem

Hansen Matrix for Networks

Some problems in neural network learning

fundamental issue

Objective function and gradient calculation

initialization

Weight coefficient initialization

The input and output numbers are m and n respectively.

Xavier initialization

Initialization of weight coefficient when activation function is ReLU function

Input vector normalization

Unit normalization, represented in a unified space

Regularization

Regularized loss function for weight decay

iterative update

Several types of equivalent regularization techniques

augmented sample set

Rotate and translate a sample in the sample set at several different small angles to form a new sample

Inject noise into input vector

Add low-power random noise to the input samples for adversarial training

early stopping technique

Detect the turning point of the verification error. Stop the iteration when the verification error starts to increase to prevent overfitting.