MindMap Gallery Statistics mind map
This is an article about statistical mind mapping, including the basics of traditional Chinese medicine research design, probability distribution of variables, parameter estimation, description of statistical data, etc.
Edited at 2023-12-02 19:35:04El cáncer de pulmón es un tumor maligno que se origina en la mucosa bronquial o las glándulas de los pulmones. Es uno de los tumores malignos con mayor morbilidad y mortalidad y mayor amenaza para la salud y la vida humana.
La diabetes es una enfermedad crónica con hiperglucemia como signo principal. Es causada principalmente por una disminución en la secreción de insulina causada por una disfunción de las células de los islotes pancreáticos, o porque el cuerpo es insensible a la acción de la insulina (es decir, resistencia a la insulina), o ambas cosas. la glucosa en la sangre es ineficaz para ser utilizada y almacenada.
El sistema digestivo es uno de los nueve sistemas principales del cuerpo humano y es el principal responsable de la ingesta, digestión, absorción y excreción de los alimentos. Consta de dos partes principales: el tracto digestivo y las glándulas digestivas.
El cáncer de pulmón es un tumor maligno que se origina en la mucosa bronquial o las glándulas de los pulmones. Es uno de los tumores malignos con mayor morbilidad y mortalidad y mayor amenaza para la salud y la vida humana.
La diabetes es una enfermedad crónica con hiperglucemia como signo principal. Es causada principalmente por una disminución en la secreción de insulina causada por una disfunción de las células de los islotes pancreáticos, o porque el cuerpo es insensible a la acción de la insulina (es decir, resistencia a la insulina), o ambas cosas. la glucosa en la sangre es ineficaz para ser utilizada y almacenada.
El sistema digestivo es uno de los nueve sistemas principales del cuerpo humano y es el principal responsable de la ingesta, digestión, absorción y excreción de los alimentos. Consta de dos partes principales: el tracto digestivo y las glándulas digestivas.
statistics
introduction
Statistics and Traditional Chinese Medicine
Why study Chinese medicine statistics
Several basic concepts
overall
Definition: A collection containing all individuals (data) studied
Whether it is countable according to the number of units it contains
finite population
infinite population
sample
Definition: It is a collection of elements extracted from the whole
Sample size: the number of elements that make up the sample
parameter
Definition: It is a general numerical measure used to describe the characteristics of a sample
Statistics
Definition: It is a general numerical measure used to describe the characteristics of a sample
variable
Definition: A concept that describes certain characteristics of a phenomenon
Divide variables by data type
Categorical variables
Definition: A name that describes a category of things
ordinal variable
Definition: A name describing an orderly category of things
Numeric variable
Definition: A name that describes the numerical characteristics of something
Application of Statistics in Traditional Chinese Medicine
How to learn Chinese medicine statistics
Basics of Traditional Chinese Medicine Research Design
Research design overview
Characteristics of Experimental Research
Basic elements of experimental design
processing factors
subject
experimental effect
objectivity
accuracy
sensitivity
Basic principles of experimental design
Contrast principle
placebo control
Blank control
Experimental control
Standard control
self control
Historical comparison
random principle
Repeat principle
Experimental design type
Completely random design
Paired design
randomized block design
factorial design
Commonly used sampling methods
simple random sampling
stratified sampling
Systematic sampling (equal sampling, mechanical sampling)
cluster sampling
Description of statistics
individual variation
frequency distribution
frequency distribution table
1. Find the extreme difference 2. Determine the group distance and group section 3. Prepare frequency distribution table 4. Calculate frequency and cumulative frequency
Frequency distribution plot
use
1. Reveal the distribution type and characteristics of data 2. Facilitate the discovery of individual extremely large or extremely small suspicious values 3. Normality judgment 4. Facilitate further calculation of indicators and statistical processing
Statistical description of quantitative data
Statistical indicator that describes central tendency
arithmetic mean
sample mean
Population mean μ
Conditions: Unimodal symmetric distribution data, especially normal distribution data
Geometric mean (G)
Conditions: geometric data, especially lognormal data
Notice
Observed/measured values cannot have 0
Observed/measured values must have the same sign, either all positive or all negative
Median(M)
Sort a set of observations from small to large, and the central observation is the median
The median is suitable for describing skewness or frequency distributions with no definite values at both ends, and has strong applicability.
percentile
Sort a set of observations from small to large. The x%th observation is the xth percentile.
Applicable conditions
Skewed distribution data
Irregularly distributed data or unclearly distributed data
open data
mode
Most occurrences
It only makes sense when the amount of data is large
Statistical indicators that describe the degree of variation
Very poor
Also called full range, represented by R, it is the difference between the maximum value and the minimum value in a set of observed/measured values.
Range of individual differences in response: large range, large degree of variation; small range, small degree of variation
advantage
Simple calculation and clear meaning
shortcoming
It only reflects the difference between two extreme values and is unstable.
Interquartile rangeQ
Reflects the range of the middle half of the observations/measurements
Advantages: simple calculation, stable than range
Disadvantages: Still does not take into account the variation of all observed/measured values, and is still not stable enough
It is mainly used to describe the variation characteristics of obviously skewed distribution data, and is often used in combination with statistical charts.
variance
The greater the variance, the greater the variation
Advantages: Considers all variations in observed/measured values, relatively stable
Disadvantages: Dimensions (i.e. units) are changed and sometimes cannot be explained
standard deviation
When the unit of the mean is the same and the values are similar, the standard deviation ↑ degree of variation ↑, the less representative the mean is.
Advantages: Taking into account the variation of all observed/measured values, the unit is the same as the original indicator, relatively stable
coefficient of variation
Also called dispersion coefficient, represented by CV
Outstanding advantages
No units for easy comparison
Applicable conditions
Compare the degree of variation of multiple sets of data with different units
Compare the degree of variation of multiple sets of data with widely different means
Statistical description of qualitative data and hierarchical data
absolute number
relative number
Commonly used relative numbers
rate (frequency indicator)
Rate
Indicates the ratio of the number of occurrences of a phenomenon to the total number of possible occurrences within a certain space or time range
Composition ratios and rates are different. We should not look at how many people who develop lung cancer smoke; we should look at how many people who smoke develop lung cancer.
Note: the meaning of numerator and denominator; observation unit; proportion base (100)
Relative indicators commonly used in medicine
cases
Indicates the frequency of a certain disease among a certain group of people at a certain point in time. It is usually used to indicate the occurrence or prevalence of chronic diseases with a long course.
Formula: Prevalence of a certain disease = (number of cases of a certain disease in a certain place during a certain period/average population in the same place during the same period) * proportion base
case fatality rate
Indicates the frequency of death due to a certain disease among patients during a certain period of time
Formula: Case fatality rate of a certain disease = (number of deaths due to a certain disease during a certain period/number of patients with the disease in the same period) * 100%
Incidence
Indicates the frequency of new cases of a certain disease among a certain population within a certain period of time.
Formula: Incidence rate of a certain disease = (number of new cases of a certain disease in a certain period/average population in the same period) * proportion base
mortality rate
Reflects the number of deaths per 1,000 people in a certain place in a given year
Formula: Death rate = (number of deaths in a certain place in a certain year/average population of the same place in the same year) * 1000
Compare
compare to
Used to describe the level of comparison between the two
Relative risk (RR)
Reflects how many times the risk of illness or death in the exposed group is that of the unexposed group, indicating the strength of the association between disease and exposure
Odds ratio (OR)
Expresses the odds ratio of the exposed proportion to the unexposed proportion in the case group and the control group.
Composition ratio (composition index)
Indicates the proportion or distribution of the internal components of an object or phenomenon
Features
The sum equals 100% or 1
Cannot increase or decrease at the same time
Generally, the result is kept to two decimal places.
Application Notes
The denominator cannot be too small
Ratio and rate cannot be mixed
Calculation of consolidation rate (total rate)
Comparability
Sampling Error – Hypothesis Testing
Normalized rate
When comparing the prevalence, incidence, mortality and other data of two different groups, in order to eliminate the impact of their internal composition on the rates, standardized rates can be used
Statistical tables and charts
Statistics table
structure
Table number
title
The upper middle position of the table
line
Generally three-wire meter
number
Remark
When explanation is needed, please indicate it with * in the table, and write the text explanation below the table.
summary graph
structure
title
Numbered below the picture
graph domain
heading
heading
Horizontal heading
vertical heading
legend
scale
unit
Commonly used
Histogram
Scatter plot
line graph
circle diagram
percentile chart
hypothetical test
hypothetical test
significance
Infer whether the population is the same based on the difference between the two samples
Basic idea
Thought of proof by contradiction
The basic steps
1. Establish hypotheses and determine test levels 2. Select test methods and calculate test statistics 3. Make statistical inferences based on P values
t-test
One-sample t-test
Its purpose is to compare and test whether the population mean μ represented by the sample mean X is different from the known population mean.
Paired sample mean t test
Classification 1. Homologous pairing: Two parts of the same subject or the same specimen are randomly assigned to receive two different treatments. 2. Heterogeneous pairing: In order to eliminate the influence of confounding factors, two homogeneous subjects are paired to receive two treatments.
two types of errors
the first sort
Category 2
Precautions
Population mean estimation and hypothesis testing
Sampling error and standard error
Commonly used methods and methods
method
Parametric test
Non-parametric test
Way
critical value method
p value method
confidence interval
Normality test and variable transformation
Normality test
simple judgment method
Graphical representation
P-P diagram
Q-Q diagram
Hypothesis testing method: P value
Test for homogeneity of variances
F test
Levene's test
Parameter Estimation
Sampling error and sampling distribution
concept
The difference between a sample statistic and a population parameter caused by sampling
Sampling distribution and standard error of sample mean
Sampling distribution of standard sample mean
The sample mean is distributed around the population mean
As n increases, the degree of variation decreases
Small variation range
Not necessarily equal to the overall mean
Standard error:
Indicates the size of the sample mean error and describes the reliability of the sample mean.
Standard error = standard deviation / square root sample size
t distribution
is a unimodal distribution curve
The degree of freedom v is the only parameter
Sampling distribution and standard error of sample rate
Estimate of the population mean
point estimate
Use sample statistics to directly estimate population parameters
interval estimate
confidence interval
An interval is used to estimate the range of the overall parameter according to a certain probability or credibility (1-α). This range is usually called the credible interval or confidence interval of the parameter. The pre-given probability (1-α) is called Credibility or confidence is often taken as 95% or 99%.
Accuracy
The closer the credibility is to 1, the higher the accuracy
Accuracy
The smaller the CL length, the higher the precision.
Estimate of overall rate
Using the sample rate as a point estimate of the population rate
Probability distribution of variables
Overall characteristics of the variable
normal distribution
concept
feature
Unimodal distribution, the peak position is at the mean
Concentration, symmetry, uniform variability
Depends on μ and σ
Area distribution law
The total area between the normal curve and the horizontal axis is always equal to 1
Binomial distribution and poisson distribution
Determination of medical reference value ranges
Definition: most normal people
in principle
The sample size is large enough for normal people
Determine single and double sides based on indicator characteristics
Appropriate percentile value