MindMap Gallery linear regression
Classification and application of linear regression. Linear regression is a statistical analysis method used to determine the interdependent quantitative relationship between two or more variables. The knowledge points are summarized and organized, covering all core contents, which is very convenient for everyone to learn.
Edited at 2024-10-14 10:54:12這是一篇關於《簡愛》人物關係分析的心智圖,幫助你理解和閱讀這本書,本圖關係梳理清楚,非常實用,值得收藏!
This is a mind map about the analysis of the character relationships in "Jane Eyre" to help you understand and read this book. The relationships in this map are clearly sorted out. It is very practical and worth collecting!
An outline of the knowledge points of air and oxygen in Chemistry, including the production of oxygen, catalysts, and reactions. This mind map will help you become familiar with the key points of knowledge and enhance your memory. Students in need can save it.
這是一篇關於《簡愛》人物關係分析的心智圖,幫助你理解和閱讀這本書,本圖關係梳理清楚,非常實用,值得收藏!
This is a mind map about the analysis of the character relationships in "Jane Eyre" to help you understand and read this book. The relationships in this map are clearly sorted out. It is very practical and worth collecting!
An outline of the knowledge points of air and oxygen in Chemistry, including the production of oxygen, catalysts, and reactions. This mind map will help you become familiar with the key points of knowledge and enhance your memory. Students in need can save it.
linear regression
simple linear regression
Meet the conditions
Linear
Scatter plot
independence
Make professional judgments
normality
Only the normality of Y-residual plot is required
equal variance
It means that within the measured range of X, no matter what value X takes, Y has the same variance-residual diagram.
It is mainly related to the estimation of confidence intervals and prediction intervals. If you are only exploring the relationship between the independent variable and the dependent variable without estimating the prediction interval and confidence interval, these two conditions can be relaxed appropriately.
Hypothesis testing of regression coefficients - analysis of variance
Total variation = regression variation residual variation
multiple linear regression
Hypothesis testing of the overall regression model - analysis of variance
Hypothesis test of partial regression coefficient—t test
Independent variable screening
Only variables that contribute significantly to the dependent variable are included in the equation
backward elimination
Focus on introducing variables with strong joint effects
Forward introduction method (forward)
Focus on introducing independent variables that have a strong independent effect
Stepwise screening method (stepwise)
between the two
When there is no linear correlation between independent variables, the calculation results of the three methods are the same.
The meaning of several coefficients
coefficient of determination
Complex correlation coefficient
adjusted coefficient of determination
result expression
P208 Table 13-9
Standardized partial regression coefficient
There is no unit of measurement, which eliminates the impact of different units of measurement and degree of variation on the partial regression coefficient.
The absolute value can be used to compare the degree of influence of its corresponding independent variable on the dependent variable.
Coefficient of determination R2
To what extent can the regression equation including all independent variables explain the variability of the dependent variable Y.
The larger the value, the better the regression effect.
R2=SS return/SS total
Complex correlation coefficient R
Indicates the degree of linear correlation between y and p independent variables
adjusted coefficient of determination
The index after deducting the influence of the number of independent variables on the coefficient of determination, that is, the index is not affected by the number of independent variables.
application
Analysis of influencing factors
Estimates and Forecasts
Predict y from x
Things to note
Sample size estimates
Rules of thumb
Generally, the sample size should be at least 10-20 times the number of independent variables.
Quantification of qualitative variables
binary variable
direct assignment
Multiple unordered categories (nominal variables)
Dummy variables should be set
Residuals (page 2)
Identification and processing of strong influence points
Judgment of strong influence points
When the standardized residual is >3, the record can be identified as a strong influence point.
Treatment method
Check whether the record is due to recording or entry errors. If so, correct those that can be corrected and eliminate those that cannot.
Consider whether the record does not belong to the same group as other records in the database (does the record belong to another subgroup?) If so, it should be deleted
If it does not fall into the above two categories, check whether the fitted model is suitable? Fitting other forms of models should be considered to correct
Robust regression, nonparametric regression
If circumstances permit, the sample size can be increased. The increase in the amount of information can appropriately weaken the role of strong influence points.
multicollinearity
Refers to the linear correlation between multiple independent variables
Performance: linear regression model, especially partial regression coefficients cannot be professionally interpreted
The test result of the entire model is P<ɑ, but the partial regression coefficient of each variable is P>ɑ
An independent variable that is considered statistically significant by professionals, but the test result is not statistically significant.
The values and even signs of the partial regression coefficients of the independent variables are contrary to the actual situation and are difficult to interpret.
The partial regression coefficient is unstable. When an independent variable or a record is added or deleted, the partial regression coefficient of the independent variable changes greatly.
identify
Collinearity Diagnosis
Tolerance
The smaller the tolerance, the more severe the multicollinearity. When the tolerance is <0.1, it indicates serious collinearity.
Variance inflation factor VIF
The reciprocal of tolerance. The larger the variance expansion coefficient, the greater the possibility of collinearity between independent variables; generally speaking, the variance expansion coefficient should not be >=5, and can also be relaxed to >=10
deal with
delete variable
Among the variables with strong correlation, delete the variables with the largest measurement error and the most missing data, which are not very important from a professional perspective.
Use other regression methods
Interactions between explanatory variables
When analyzing, the interaction term can be introduced to analyze
Setting of inspection levels
General enter<eliminate
When the number of variables is small or the study is exploratory
Entry=0.10, Elimination=0.15
When the number of variables is large or confirmatory research
Entry=0.05, Elimination=0.10
Example 13-3 Analysis process
1. Draw a scatter plot (matrix scatter plot)
2. Independence
3. Normality and homogeneity of variances
4. Collinearity diagnosis
5. Linear regression analysis—establish regression equation—analyze influencing factors
6. Regression diagnosis
residual
Residual type
Unstandardized residuals (raw residuals)
Standardized residuals (Pearson residuals)
studentized residuals
Eliminate residuals
Studentized elimination of residuals
Residual analysis
Analysis-Regression-Linear-Save-Residuals (generally choose Standardization)-Scatter Plot
Purpose: Test whether there is a linear relationship between the dependent variable and the independent variable
graphics
Regression model building steps
Draw scatter plots to observe trends among variables
Examine data distribution and make necessary judgments
Perform linear regression analysis
Residual analysis
Diagnosis of strong influence points and judgment of multicollinearity