MindMap Gallery Chapter 23 - Statistics and Data Science
This is a mind map about statistics and data science. Statistics is a discipline about data. Generally speaking, statistics is about collecting, organizing, analyzing data and drawing conclusions from the data.
Edited at 2023-11-01 18:48:52El cáncer de pulmón es un tumor maligno que se origina en la mucosa bronquial o las glándulas de los pulmones. Es uno de los tumores malignos con mayor morbilidad y mortalidad y mayor amenaza para la salud y la vida humana.
La diabetes es una enfermedad crónica con hiperglucemia como signo principal. Es causada principalmente por una disminución en la secreción de insulina causada por una disfunción de las células de los islotes pancreáticos, o porque el cuerpo es insensible a la acción de la insulina (es decir, resistencia a la insulina), o ambas cosas. la glucosa en la sangre es ineficaz para ser utilizada y almacenada.
El sistema digestivo es uno de los nueve sistemas principales del cuerpo humano y es el principal responsable de la ingesta, digestión, absorción y excreción de los alimentos. Consta de dos partes principales: el tracto digestivo y las glándulas digestivas.
El cáncer de pulmón es un tumor maligno que se origina en la mucosa bronquial o las glándulas de los pulmones. Es uno de los tumores malignos con mayor morbilidad y mortalidad y mayor amenaza para la salud y la vida humana.
La diabetes es una enfermedad crónica con hiperglucemia como signo principal. Es causada principalmente por una disminución en la secreción de insulina causada por una disfunción de las células de los islotes pancreáticos, o porque el cuerpo es insensible a la acción de la insulina (es decir, resistencia a la insulina), o ambas cosas. la glucosa en la sangre es ineficaz para ser utilizada y almacenada.
El sistema digestivo es uno de los nueve sistemas principales del cuerpo humano y es el principal responsable de la ingesta, digestión, absorción y excreción de los alimentos. Consta de dos partes principales: el tracto digestivo y las glándulas digestivas.
Statistics and Data Science
statistics
definition
Statistics is a discipline about data. In summary, statistics is about collecting, organizing, analyzing data and drawing conclusions from the data.
two branches
Descriptive Statistics
Definition: Statistical methods for collecting, organizing, and describing research data.
(1) How to obtain the required data (collection) (2) How to organize and display data using charts or mathematical methods (organizing) (3) How to describe the general characteristics of the data (description)
inferential statistics
Meaning: A statistical method that studies how to use sample data to infer population characteristics
1. Parameter estimation
Use sample information to infer overall characteristics
2. Hypothesis testing
Use sample information to determine whether the overall hypothesis is true
Variables and data
variable
Definition: A variable is an attribute or characteristic of the research object, which can have two or more possible values.
Quantitative variables
Also called "quantity variable", the value of the variable is the quantity. Such as company sales, number of registered employees
Qualitative variables
Categorical variables
The value of the variable is the category. Such as the industry the company belongs to and the gender of employees
ordinal variable
The values of variables are categorical and sequential. Such as employee education level, satisfaction
data
Definition: Data is the result of measuring and observing variables. Data can be in the form of numerical values, text or images, etc.
Quantitative data
It is the observation result of a quantitative variable, and its value is expressed as a specific numerical value. For example, the company’s sales are 10 million yuan
Classified data
It is the observation result of a categorical variable, expressed as a category, generally expressed in words, but can also be described numerically. For example, use 1 to represent "male" and 2 to represent "female"
sequential data
It is the observation result of an ordinal variable, expressed as a category, generally described by words, but can also be described by numbers. For example, 1 represents "master's degree and above", 2 represents "bachelor's degree", and 3 represents "college degree and below"
Source of data
By collection method
data observation
Data collected through direct survey or measurement. Almost all statistical data related to socioeconomic phenomena are observational data, such as GDP, CPI, housing prices, etc.
Experimental data
Data collected by controlling experimental subjects and the experimental environment in which they are exposed during experiments. For example, data on the service life of a new product, data on the efficacy of a new drug. Most data in the field of natural sciences are experimental data
According to user perspective
primary data
Data derived from direct surveys and scientific experiments are the direct source of data for users. Its main sources include: investigation or observation, experiment [Tip] In the socio-economic field, the main method of obtaining data during statistical surveys is It is also an important way to obtain first-hand data.
Secondary data
Data derived from someone else's survey or experiment. This is an indirect source of data for users.
statistical survey
important features
First, investigation is an activity with plans, methods and procedures; Second, the results of the investigation are expressed in the collected data.
Classification
According to survey object different scope
full investigation
1. Comprehensive statistical reports
2. Census
(1) Census: a registration survey of the entire population of the country without exception
(2) Economic Census: The object is within the territory of the People’s Republic of China Those engaged in secondary and tertiary industry activities All legal entities, industrial activity units and self-employed households.
non-comprehensive investigation
Conduct a survey on some of the units under investigation. Including: non-comprehensive statistical reports, sampling surveys, key surveys and typical surveys
According to the time of survey registration Is it continuous?
continuous investigation
Observe the quantitative changes of the overall phenomenon within a certain period of time and explain the development process of the phenomenon, The purpose is to understand the total amount of social phenomena over a period of time. Such as factory product production, raw material input, energy consumption, birth and death of the population, etc.
The result is the "number of periods". The cumulative calculation is meaningful and can be viewed as a video.
discontinuous survey
surveys conducted over a considerable period of time (usually more than a year), Generally, it is to study the state of the overall phenomenon at a certain point in time. Such as production equipment ownership, cultivated land area, etc.
The result is "number of time points". The cumulative calculation is meaningless and can be regarded as a photo.
Way
Statistical reports
Meaning: A survey method that arranges uniformly from top to bottom and provides basic statistical data step by step from bottom to top. Statistical reports must be based on certain original data and filled out in accordance with unified table formats, unified indicators, unified submission time and submission procedures.
Types of statistical reports: 1. According to the different scope of survey objects: comprehensive statistical reports and non-comprehensive statistical reports. Most current statistical reports are comprehensive reports. 2. It can be divided into different types according to the length of the reporting period: daily report, monthly report, quarterly report, annual report, etc.; 3. According to the content of the report and the scope of implementation, it can be divided into: national, departmental, and local statistical reports.
census
Meaning: A one-time comprehensive investigation specially organized for a specific purpose, Such as census, economic census, agricultural census, etc. It is mainly used to understand the basic overall picture of social and economic phenomena at a certain point in time. Provide a basis for the country to formulate relevant policies.
4 features
(1) Censuses are usually one-time or periodic ①The economic census is conducted twice every 10 years, and is implemented in years with the last number being 3 or 8; ②The census is conducted every 10 years and is implemented in year "0"; ③The agricultural census is conducted once every 10 years and is implemented every "6" year
(2) The census generally requires a unified standard survey time (i.e. deadline), To avoid duplication or omission of survey data and ensure the accuracy of census results [Tips] The standard time for the fifth, sixth and seventh censuses is 0:00 on November 1 of the census year. The standard time for the agricultural census and economic census is 0:00 on January 1 of the census year. The standard time is generally set as the time when the survey objects are relatively concentrated and the relative changes are small.
(3) Census data are generally more accurate and have a higher degree of standardization
(4) The scope of use is relatively narrow and can only investigate basic and specific phenomena.
sample survey
Meaning: A non-comprehensive survey that selects some units from the population of survey objects as samples for investigation, and infers the quantitative characteristics of the population based on the sample survey results.
Features: (1) Economy: The most significant advantage is that since the sample unit of the survey is usually a small part of the overall unit, the workload of the survey is small, so a lot of manpower, material resources, financial resources and time can be saved. (2) High timeliness: the required information can be obtained quickly and timely (3) Wide adaptability: It can obtain a wider range of information and is suitable for investigating various fields and various issues. (4) High accuracy: The data quality of sample surveys is sometimes higher than that of comprehensive surveys. Because the workload is small, each link can be done more carefully, and the errors are often smaller.
Focus on investigation
Meaning: Select a few key units from the population of survey objects for investigation. The selected key units account for the vast majority of the total in terms of the index value of the survey.
Features: Key investigations have a wide scope of application and can be carried out at a faster speed with less investment. Obtain the basic situation or changing trend of the main signs of certain phenomena. The purpose only requires understanding the basic situation and development trends, and does not require mastering comprehensive data.
Examples of teaching materials: (1) In order to understand the trend of retail price changes in urban cities across the country in a timely manner, we can investigate the changes in retail prices in 35 large and medium-sized cities across the country. This is the key investigation. (2) To understand the added value and total assets of industrial enterprises across the country in a timely manner, we only need to conduct a focused survey of large and medium-sized industrial enterprises across the country. (3) For example, the National Bureau of Statistics’ online direct reporting system for 5,000 industrial enterprises across the country is a key survey
typical survey
Meaning: According to the purpose and requirements of the investigation, on the basis of a comprehensive analysis of the objects under investigation, a number of typical or representative units are consciously selected for investigation.
effect
(1) Make up for the shortcomings of comprehensive investigation (2) The authenticity of comprehensive survey data can be verified under certain conditions For example, after a major census, you can select several typical units to check the accuracy of the statistical data.
advantage
Typical surveys have the advantage of being flexible and can obtain in-depth and detailed statistical data through a few typical units. Typical investigation is not a method unique to statistical activities, but it is an indispensable method in terms of statistical processes. The use of typical surveys is not to reflect the overall quantitative characteristics of phenomena, but mainly to understand the vivid specific situations related to statistics, that is, the social conditions related to the quantity of phenomena and their interconnections, in order to conduct in-depth statistical analysis
shortcoming
Restricted by "consciously selecting a number of representative units", it is largely affected by people's subjective understanding. It must be used in conjunction with other surveys to avoid one-sidedness.
Statistical Quality Evaluation Criteria
authenticity, accuracy, completeness, Timeliness, applicability, economy Comparability, coordination, availability