Features
Features:

Product Tour >

Edraw AI >

Paid Plans:

Individuals >

Business >

Eduaction >
Resources
Blog

History

How-tos & Tips

Discovery

Biography

Business Analysis

Examples

AI concept Map

Free AI Mind Map Generator

Onenote Mind Map

Bcg Matrix Examples

Nike Marketing Strategy

Unilever SWOT Analysis

Make Mind Maps in Google Docs

Guide

FAQs

What's New

Resource Center
Templates
All Templates

Brain Storming Templates

Strategy and Planning Templates

Project Management Templates

Product Management Templates

Human Resources Templates

Agile Workflow Templates

Marketing Templates

Education Templates

Fun and Games Templates

User Gallery
Download
Pricing
Enterprise

EdrawMind

Data mining and analysis technology mind map

MindMap Gallery Data mining and analysis technology mind map

Data mining and analysis technology mind map

A computing process that uses methods such as artificial intelligence, machine learning, and statistics to extract useful, previously unknown patterns or knowledge from massive amounts of data.

Edited at 2021-12-27 22:46:49

PlotWizard

Recent works View more works>>

Data mining and analysis technology mind map

PlotWizard

Recent works View more works>>

Recommended to you
Outline

Data mining and analysis technology

Chapter 1 Overview of Data Mining

Understand before class

Summary

machine learning

Operating procedures

data import

Data preprocessing

feature engineering

Split

Training model

Evaluation model

Predict new data

Characteristics of big data

A lot

Diverse

high speed

value

1.1 Introduction to Data Mining

definition

A computing process that uses methods such as artificial intelligence, machine learning, and statistics to extract useful, previously unknown patterns or knowledge from massive amounts of data.

background

The amount of data has expanded dramatically, giving rise to new research directions: database-based knowledge discovery, and research on corresponding data mining theories and technologies.

The next technology hotspot after the Internet

While a large amount of information brings convenience to people, it also brings a lot of problems.

Too much information and difficult to digest

It is difficult to distinguish the authenticity of information

Information security is difficult to guarantee

Information comes in different forms and is difficult to process uniformly

Explosive data but poor knowledge

The evolution from business data to business information

Data collection → data access → data warehouse, decision support → data mining (providing predictive information)

stage

Data preprocessing

Clean, integrate, select, transform

data mining

model evaluation

process

data, information, knowledge

data

"8000m", "10000m"

Produced from the observation and measurement of objective things, we call the objective things under study entities

information

"8000m is the maximum altitude for aircraft flight", "10000m high mountain"

Knowledge

"Planes cannot climb over this mountain"

wisdom

main content

Association rule mining

beer and diapers

supervised machine learning

Discrete label prediction—label classification

Continuous Label Prediction—Numerical Prediction

Unsupervised machine learning—clustering (similarity algorithm)

return

Establish quantitative relationships between multiple variables

Classification of algorithms

supervised learning

Learn a function (model) from the given training data. When new data arrives, the result can be predicted based on this function (model)

Training data has clear identification or results

Regression algorithm, neural network, SVM support vector machine

Regression algorithm

linear regression

Deal with numerical problems, and the final prediction result is a number, such as: house price

logistic regression

Belongs to a classification algorithm, such as: determining whether an email is spam

Neural Networks

Applied to visual recognition and speech recognition

SVM support vector machine algorithm

Enhancement of logistic regression algorithm

unsupervised learning

Training data is not specifically labeled

Clustering algorithm, dimensionality reduction algorithm

Clustering Algorithm

Calculate the distance in the population and divide the data into multiple populations based on the distance

Dimensionality reduction algorithm

Reduce the data from high dimensionality to low dimensionality. The dimension represents the size of the feature quantity of the data. For example: house price contains the four characteristics of the length, width, area, and number of rooms of the house, that is, the dimension is 4-dimensional data, and the length and width facts The above information overlaps with the information represented by area. Area = length × width. Redundant information is removed through dimensionality reduction.

Compress data and improve machine learning efficiency

Enterprise data applications

semi-supervised learning

How to use a small number of labeled samples and a large number of unlabeled samples for training and classification problems

Image Identification

reinforcement learning

Learning subjects make judgments based on feedback from their observed surroundings

Robot control

1.2 Basic processes and methods of data mining

basic method

Predictive Mining

Extrapolate on current data to make predictions

descriptive mining

Characterize the general characteristics of the data in the database (correlation, trend, clustering, anomaly...)

Data mining flow chart

Main data mining methods in Sixth Middle School (P6)

Summary summary of the data set

Data association rules

A way of describing potential connections between data, usually represented by the implication A-B

Classification and prediction

clustering

Heterogeneous detection

time series model

1.3 Application of data mining

business

Healthcare and Medicine

banking and insurance

social media

tool

Weka, matlab, Java

Relevant information

subtopic

Chapter 2 Data Description and Visualization

2.1 Overview

Analyze data attributes and data values→data description and visualization

2.2 Data objects and attribute types

data set

Composed of data objects

Sales database: customers, store items, sales Medical database: patient, treatment information University database: student, professor, course information

data object

A data object represents an entity

Known as: sample, example, instance, data point, object, tuple

Attributes

a characteristic of a data object

the term

Database: Dimension

Machine Learning: Features

Statistics: Variables

Data Mining, Databases: Properties

Classification

Nominal properties

Nominal attribute values are some symbols or names of things that represent categories and names

Nominal attribute: hair color, possible values: black, white, brown Nominal attribute: Marital status, possible values: married, single, divorced, widowed

Binary attributes (special nominal attributes)

There are only two categories and status

symmetric binary

The difference in data size is small Example: Gender - male, female

asymmetric binary

Data size varies greatly Example: Medical test – negative, positive

ordinal properties

There is an order, but the difference between them is unknown. It is usually used for rating.

Teacher title, military rank, customer satisfaction

Numeric properties

interval scaling properties

Sequentially measured in unit length

Ratio scale properties

Has a fixed zero point, is ordered and can calculate multiples

Discrete and continuous attributes

2.3 Basic statistical description of data

measure of central tendency

mean, median, mode

Metric data spread

Range, quartile, quartile range

Five-number summary, box plots and outliers

Variance, standard deviation

Graphical depiction of basic statistics of data

Quantile plot

Quantile - Quantile plot

Histogram

Height - quantity, frequency

Scatter plot

Discover correlations between attributes

2.4 Data visualization

definition

Express data effectively through graphics

Three visualization methods

Boxplot (boxplot)

Analyze the dispersion differences of multiple attribute data

Can display the distribution of data and display outliers (need to be deleted)

Histogram

Analyze the change distribution of a single attribute in various intervals

Scatter plot

Display the correlation distribution between two sets of data

2.4.1 Pixel-based visualization

A simple way to visualize one-dimensional values is to use pixels, whose color reflects the value of that dimension

Suitable for one-dimensional values, not suitable for distribution of multi-dimensional spatial data

2.4.2 Geometric projection visualization

Help users discover projections of multidimensional data. The primary challenge of geometric projection technology is to figure out how to visualize high-dimensional space in two dimensions.

For two-dimensional data points, a Cartesian coordinate system scatter plot is usually used. Different colors or shapes can be used in the scatter plot as the third dimension of the data.

(Used for three-dimensional data sets) Scatter plots, scatter plot matrices, and parallel coordinate visualization (when the number of dimensions is large)

2.4.3 Icon-based visualization

Represent multidimensional data values with a small number of icons

Two commonly used icon methods

Chernov face (allows visualization up to 36 dimensions)

Reveal trends in data

Elements such as the eyes, mouth, and nose of the face use different shapes, sizes, positions, and directions to represent dimension values.

Each face represents an n-dimensional data point (n≤18), and the meaning of various facial features is understood by identifying small differences in faces.

character line drawing

2.4.4 Hierarchical visualization

Divide all dimensions into subsets (i.e. subspaces) and visualize these subspaces hierarchically

Two commonly used hierarchical visualization methods

X-axis Y-axis subset hierarchy

number chart

2.4.5 Visualizing complex objects and relationships