MindMap Gallery Ensemble learning
It contains two major categories of algorithms: bagging and boosting. It is introduced in detail and described comprehensively. I hope it will be helpful to interested friends!
Edited at 2023-12-23 14:09:40One Hundred Years of Solitude is the masterpiece of Gabriel Garcia Marquez. Reading this book begins with making sense of the characters' relationships, which are centered on the Buendía family and tells the story of the family's prosperity and decline, internal relationships and political struggles, self-mixing and rebirth over the course of a hundred years.
One Hundred Years of Solitude is the masterpiece of Gabriel Garcia Marquez. Reading this book begins with making sense of the characters' relationships, which are centered on the Buendía family and tells the story of the family's prosperity and decline, internal relationships and political struggles, self-mixing and rebirth over the course of a hundred years.
Project management is the process of applying specialized knowledge, skills, tools, and methods to project activities so that the project can achieve or exceed the set needs and expectations within the constraints of limited resources. This diagram provides a comprehensive overview of the 8 components of the project management process and can be used as a generic template for direct application.
One Hundred Years of Solitude is the masterpiece of Gabriel Garcia Marquez. Reading this book begins with making sense of the characters' relationships, which are centered on the Buendía family and tells the story of the family's prosperity and decline, internal relationships and political struggles, self-mixing and rebirth over the course of a hundred years.
One Hundred Years of Solitude is the masterpiece of Gabriel Garcia Marquez. Reading this book begins with making sense of the characters' relationships, which are centered on the Buendía family and tells the story of the family's prosperity and decline, internal relationships and political struggles, self-mixing and rebirth over the course of a hundred years.
Project management is the process of applying specialized knowledge, skills, tools, and methods to project activities so that the project can achieve or exceed the set needs and expectations within the constraints of limited resources. This diagram provides a comprehensive overview of the 8 components of the project management process and can be used as a generic template for direct application.
Ensemble Learning (Part 1)
Introduction
Idea; build and combine multiple weak learners to complete learning tasks
Illustration:
Two issues that need to be paid attention to in integrated learning
How to train a single weak learner?
Method 1: Change the weight of the training data set
Method 2: Change the probability distribution of the training data set
How to combine weak learners into strong learners?
Method 1: Parallel voting method
Method 2: Serial weighting method
Two major categories of ensemble learning
Bagging: There is no strong dependency between base learners and a parallelization method that can be generated simultaneously
representative algorithm
random forest
Algorithm idea: Use decision trees as weak learners, and integrate the weak learners in the bagging method
How is random forest random? (by changing the probability distribution of the data set)
Method 1: Forest-RI
Each time you build a training set, you need to randomly select k samples from the data set D, and randomly select n features from M features.
Illustration:
Method 2: Forest-RC
Each time a training set is constructed, n features need to be randomly selected from the M features of the data set D and linearly weighted to form a data set containing F new features. (Random number with weight coefficient [-1,1])
Illustration:
Algorithm steps
Step 1: Choose a weak learner (decision tree, KNN, logistic regression, etc.)
Step 2: Construct a training set based on randomness
Forest-RI
Forest-RC
Step 3: Train the current weak learner
Step 4: Determine whether the strong learner is qualified based on the voting mechanism
Voting mechanism: the mode of all weak learner results
Illustration:
Advantages and Disadvantages
advantage
During training, trees are independent of each other, and the training speed is fast.
The generalization error uses unbiased estimation, and the model has strong generalization ability.
It has its own bagged data set, so there is no need to separate the cross-validation set
In the face of imbalanced and missing data sets, the model accuracy is still high
shortcoming
Random forests can overfit on some noisy classification or regression problems
Random forest has many parameters and is difficult to adjust.
optimization
Aiming at the problem of too many parameters and difficulty in parameter adjustment
Be familiar with the parameters first, and then adjust them based on grid search.
Illustration of the influence of parameters on the model:
Boosting: There is a strong dependency between base learners and a serialization method that must be generated serially.
representative algorithm
AdaBoost
Algorithm idea: train a weak learner in each round. The weight of the training samples in the previous round is changed and used as the training data for the weak learner in the next round. Finally, each weak learner is combined into an integrated model through linear weighting.
Algorithm steps
Step 1: Choose a weak learner (decision tree, KNN, logistic regression, etc.)
Step 2: Initialize or update sample weights
Initialize sample weights, that is, each sample has the same weight
Update sample weights, that is, reduce the weight of correctly classified samples and increase the weight of incorrectly classified samples.
Illustration:
Step 3: Train the current weak learner
Step 4: Calculate the weight of the current weak learner
Step 1: Calculate the error rate of the current weak learner (the ratio of the number of incorrectly classified samples to the number of all samples)
Step 2: Calculate the weight of the current weak learner based on the error rate
Illustration:
Step 5: Add the current weak learner to the linear model and determine whether it is qualified
linear model
Illustration:
How to judge?
Accuracy of strong learners
The number of weak learners among strong learners
Advantages and Disadvantages
advantage
AdaBoost has high accuracy
AdaBoost can use different classification algorithms as weak classifiers and is not limited to decision trees.
shortcoming
Parameter training is time-consuming
Data imbalance can easily lead to loss of accuracy
The number of weak classifiers is not easy to determine
optimization
Aiming at training time consumption: Use forward distribution algorithm to speed up parameter optimization
If the number of classifiers is difficult to determine: use cross-validation to assist in determination
GBDT (Gradient Boosting Tree)
boost tree
Regression boosting tree: simple addition of multiple weak regressors
Classification boosting tree: simply add multiple weak classifiers
Gradient boosting trees: unified classification, regression boosting trees
Algorithm idea: Use CART regression tree as a weak learner, construct a new round of weak learners based on the loss of the weak learners, and finally linearly add all weak learners.
Algorithm steps
Step 1: Choose a weak learner (decision tree, KNN, logistic regression, etc.)
Step 2: Construct a training set (randomness) by calculating the negative gradient of the loss function of the current weak learner (fitting residual) and random sampling of features and samples of data set D
Step 3: Train the current weak learner
Step 4: Add the current weak learner to the linear model and determine whether it is qualified
Advantages and Disadvantages
advantage
Suitable for low-dimensional data and can handle non-linear data
Using some robust loss functions, it is very robust to outliers
Due to the advantages of both bagging and boosting, the theoretical level is higher than random forest and adaboost.
shortcoming
Difficulty training data in parallel due to dependencies between weak learners
Higher data dimensions will increase the computational complexity of the algorithm.
Since the weak learner is a regressor, it cannot be directly used for classification.
optimization
Achieve partial parallelism through self-sampling SGBT
XGboost: an efficient implementation of GBDT, with new regularization terms and quadratic Taylor expansion fitting of the loss function
LightGBM: An efficient implementation of XGBoost, which discretizes continuous floating-point features into k discrete values and constructs a Histogram with a width of k, speeding up calculations and saving space resources.