MindMap Gallery Data structure knowledge framework
Sorting out the knowledge points of the data structure knowledge framework. Including matrices, recursion, queues, initial trees, algorithm complexity, initial data structures, friends who are interested in the data structure knowledge framework can download and collect it.
Edited at 2023-03-14 22:21:10One Hundred Years of Solitude is the masterpiece of Gabriel Garcia Marquez. Reading this book begins with making sense of the characters' relationships, which are centered on the Buendía family and tells the story of the family's prosperity and decline, internal relationships and political struggles, self-mixing and rebirth over the course of a hundred years.
One Hundred Years of Solitude is the masterpiece of Gabriel Garcia Marquez. Reading this book begins with making sense of the characters' relationships, which are centered on the Buendía family and tells the story of the family's prosperity and decline, internal relationships and political struggles, self-mixing and rebirth over the course of a hundred years.
Project management is the process of applying specialized knowledge, skills, tools, and methods to project activities so that the project can achieve or exceed the set needs and expectations within the constraints of limited resources. This diagram provides a comprehensive overview of the 8 components of the project management process and can be used as a generic template for direct application.
One Hundred Years of Solitude is the masterpiece of Gabriel Garcia Marquez. Reading this book begins with making sense of the characters' relationships, which are centered on the Buendía family and tells the story of the family's prosperity and decline, internal relationships and political struggles, self-mixing and rebirth over the course of a hundred years.
One Hundred Years of Solitude is the masterpiece of Gabriel Garcia Marquez. Reading this book begins with making sense of the characters' relationships, which are centered on the Buendía family and tells the story of the family's prosperity and decline, internal relationships and political struggles, self-mixing and rebirth over the course of a hundred years.
Project management is the process of applying specialized knowledge, skills, tools, and methods to project activities so that the project can achieve or exceed the set needs and expectations within the constraints of limited resources. This diagram provides a comprehensive overview of the 8 components of the project management process and can be used as a generic template for direct application.
Data structure knowledge framework
B-tree
Balanced multi-tree
nature
The root node has at least two children
Each non-root node has at least M/2 (rounded up) children and at most M children.
Each non-root node has at least M/2-1 (rounded up) keywords, and they are arranged in ascending order
The value of the child node between key[i] and key[i 1] is between key[i] and key[i 1]
All leaf nodes are on the same layer
B-tree
nature
Its definition is basically the same as B-tree
The subtree pointer of a non-leaf node is the same as the number of keywords
The subtree pointer P[i] of the non-leaf node points to the subtree whose key value belongs to [k[i],k[i 1])
Add a link pointer to all leaf nodes
All keywords appear in leaf nodes
search
Basically the same as B-tree
characteristic
All keywords appear in the linked list of leaf nodes (dense index), and the keywords in the linked list happen to be ordered
Impossible to hit at leaf node
Non-leaf nodes are equivalent to the index of leaf nodes (sparse index), and leaf nodes are equivalent to the data layer that stores (keyword) data.
More suitable for file indexing systems
B* tree
Add pointers to brothers between the non-root and non-leaf nodes of the B-tree
Subtopic 2
Hash
Search (retrieval)
The process of finding whether there is a data element with a key equal to a given key in a collection of data elements.
result
success
fail
search structure
Data collection for searching
Search context
static environment
The search structure will not change before and after operations such as insertion and deletion are performed.
dynamic environment
In order to maintain high search efficiency, the search structure will be automatically adjusted before and after operations such as insertion and deletion, and the structure may change.
Find
static search
Sequence table
Traversing from front to back O(n)
ordered sequence list
Binary search O(log(n))
index sequence table
dynamic search
binary tree structure
Binary sorting tree
balanced binary tree
tree structure
B-tree
B-tree
Hash lookup
The key of each element corresponds to a unique knot storage location in the structure
Hashing method
Insert and search data. Use hash function to find the storage location and then store or search it.
Hash collision
For two data Ki and Kj (i!=j), Ki!=Kj, but there is Hash(Ki)==Hash(Kj)
hash function
The domain of the hash function must include all keys that need to be stored, and if the hash table allows m addresses, its value range must be between 0-m-1
The addresses calculated by the hash function can be evenly distributed throughout the space
The hash function should be relatively simple
Common hash functions
direct addressing method
Take a linear function of the keyword as the hash address
Advantages: simple and uniform
Disadvantages: Need to know the distribution of keywords in advance
Usage scenario: Suitable for finding relatively small and continuous situations
division leaving remainder method
Let the number of addresses allowed in the hash be m, take a prime number p that is no greater than m, but is closest to or equal to m as the divisor, and divide the key according to the hash function: Hash(key)=key%p,p<=m; code converted into hash address
Square-Medium Method
Assume that the keyword is 1234, the square is 1522756, and then extract the three-digit number 227 in the middle as the hash address
folding method
Divide the keyword into several parts with equal digits from left to right, then superimpose and sum these parts, and take the last few digits as the hash address according to the length of the hash table.
random number method
Choose a random function and take the random function value of the keyword as its hash address
mathematical analysis
Assuming n d digits, each digit can have r different symbols. The frequency of these r different symbols appearing in each digit is not necessarily the same. They may be relatively evenly distributed in some digits. The distribution can be uniform. Several bits are used as the hash address according to the hashing method.
Hash conflict handling method
closed hashing
Once a conflict occurs, look for the next empty hash table address. As long as the hash table is large enough, the empty hash address can always be known.
linear exploration
When inserting, it is found that the data at this position is equal to the data to be inserted, and no insertion is performed.
If a conflict occurs, check the next bucket in turn and insert data if there is an empty position.
Second exploration
Using the quadratic exploration method, the formula for finding the next empty position in the table is H(i)=(H0 i^2)%m,H(i)=(H0-i^2)%m
When the length of the table is a prime number and the reproduction factor does not exceed 0.5, new table entries can be inserted
double hashing
Two hash functions are required, and when one collides, the next hash function is used to calculate the offset.
load factor
a=number of elements filled in the table/length of the hash table
Control it below 0.7-0.8. If it exceeds 0.8, the CPU cache index will increase when looking up the table.
open hashing
First, a hash function is used to calculate the hash address of the key set. Key codes with the same address belong to the same sub-set. Each sub-set is called a bucket. The elements in each bucket are linked through a single linked list. The head of each linked list The points form a vector with the same number of elements as the number of possible buckets.
bloom filter
When an element is added to the set, k hash functions are used to map the element into k points in a bit array, and set them to 1. When retrieving, just check to see if they are all 1. That's fine. As long as there is one Zero means no, all 1s, probably yes
sort
concept
It is to organize a set of messy data according to a certain rule (ascending or descending order)
data sheet
A finite set of data elements to be sorted
Sorting code
Usually data elements have multiple attribute fields, one of which can be used to distinguish elements and serve as the basis for sorting. This field is the sorting code.
If the sorting codes of each element in the data table are different from each other, this sorting code is called the main sorting code.
The stability of sorting algorithms
Two elements R[i]R[j], their sorting code K[i]==K[j], and before sorting, the element R[i] is before R[j], and the elements are between R[i] and The order of R[j] remains unchanged
Common sorting algorithms
insertion sort
direct insertion sort
Insert into the sorted sequence, first find the position, and then move the element after the position back
Stablize
Hill sort
Reduce incremental sorting
selection sort
selection sort
Put the smallest element at the front each time
Tournament sorting
Keep comparing two by two to find the winner, no longer compare this one, continue to compare the other two by two, get the winner, and keep looping
unstable
Heap sort
Create a large heap in ascending order, otherwise create a small heap
unstable
swap sort
Bubble Sort
Stablize
Compare two adjacent elements in sequence and exchange if the conditions are not met.
Quick sort
Take a base value and place the smaller ones on the left and the larger ones on the right. The left and right parts recursively take the base value and continue dividing.
merge sort
merge sort
Divide the sequence to be sorted into two subsequences of equal length and then merge them into one sequence
non-comparative sorting
counting sort
Radix sort
picture
A data structure consisting of a collection of vertices and relationships between vertices
vertices and edges
vertex
The nodes in the graph are called nodes
side
There is an edge between vertices
Classification of graphs
directed graph
In a directed graph, the vertex pair <x, y> is ordered <x, y> <y, x> is two different edges
Undirected graph
(x,y)(y,x) is a side
complete graph
In an undirected graph with n vertices, if there are n*(n-1)/2 edges, that is, there is and is only one edge between any two vertices.
In a directed graph with n vertices, if there are n*(n-1) edges, that is, there are and are only edges in opposite directions between any two vertices.
adjacent node
In the undirected graph G, if (u, v) is an edge in E (G), then u and v are said to be adjacent vertices to each other, and (u, v) is said to be attached to the vertices u and v.
In the directed graph G, if <u,v> is an edge in E(G), then the vertex u is said to be adjacent to v, and the vertex v is adjacent to the vertex u, and the edge <u,v> is said to be the sum of the vertex u and v is associated with
degree of vertex
the number of edges associated with it
In a directed graph, the degree of a vertex is equal to the sum of the in-degree and out-degree of the vertex, where the in-degree of vertex v is the number of directed edges ending with v, denoted as indev(v) the out-degree of vertex v. The degree is the number of directed edges with v as the starting point, denoted as outdev(v)
The degree of an undirected graph is equal to the in-degree and out-degree dev(v) = indev(v) = outdev(v)
path
In the graph G=(V,E), if there is a set of edges starting from vertex vi that can reach vertex vj, then the vertex sequence from vertex vi to vj is called the path from vertex vi to vertex vj.
right
Data information attached to the edge
path length
For unweighted graphs, the path length of an edge refers to the number of edges on the path.
For weighted graphs, the length of a path refers to the sum of the weights of each edge on the path.
Simple paths and loops
If the vertices on the path are not repeated, then such a path is called a simple path. If the first vertex v1 and the last vertex vm on the path coincide, then such a path is called a loop or ring.
subplot
Assume graph G={V,E} and graph G1={V1.E1}. If V1 belongs to V and E1 belongs to E, then G1 is said to be a subgraph of G.
connected graph
In an undirected graph, if there is a path between two vertices, it is connected. If any pair of vertices is connected, the graph is called a connected graph.
Strongly connected graph
In a directed graph, between any pair of vertices vi and vj, there is a path from vi to vj, and there is also a path from vj to vi.
spanning tree
The minimum connected subgraph of a connected graph is called the spanning tree of the graph. The spanning tree of a connected graph with n vertices has n vertices and n-1 edges.
Graph storage structure
adjacency matrix
adjacency list
Undirected graph
directed graph
Graph traversal
depth first
breadth first
connected components
When the undirected graph is a non-connected graph, starting from a certain vertex in the graph, the depth-first search or breadth-first search algorithm cannot traverse all the vertices of the graph, but can only access all the vertices of the largest connected subgraph where the node is located. , these vertices form a connected component
minimum spanning tree
criteria
A minimum spanning tree can only be constructed using edges in the graph
Only n vertices in the graph can be connected using exactly n-1 edges
The selected n-1 edges cannot form a loop.
greedy algorithm
When solving problems, always make the choice that seems best at the moment, that is, the local optimal solution.
Kruskal algorithm
Find an edge with the shortest weight and no longer on the same connected component each time to add to the spanning tree.
prime algorithm
Find next to each other
unit shortest path
Starting from a vertex in a weighted graph, find the shortest path to another vertex. The shortest path is the minimum sum of weights along the path.
First introduction to data structures
concept
data
Symbols that describe objective things are objects that can be manipulated in the computer. They are a collection of symbols that can be recognized by the computer and input to the computer for processing.
data element
It is a basic unit that makes up data and has a certain meaning. In computers, it is usually processed as a whole. It is also called a record.
data item
A data element can consist of several data items. A data item is the smallest inseparable unit of data
data structure form
data structure
is a collection of data elements that have one or more specific relationships with each other
Classification
logical structure
Set structure
linear structure
tree structure
Graphical structure
physical structure
sequential storage structure
chain storage structure
specific concept
logical structure
Interrelationships between data elements in a data object
Set structure
The data elements in the set have no other relationship between them except that they belong to the same set.
linear structure
There is a one-to-one relationship between data elements
tree structure
A one-to-many hierarchical relationship between data elements
Graphical structure
Data elements are many-to-many relationships
physical structure
The storage form of the logical structure of data in the computer
sequential storage structure
Data elements are stored in storage units with consecutive addresses, and the logical and physical relationships between the data are consistent.
chain storage structure
Data elements are stored in any storage unit. The storage unit can be continuous or discontinuous.
The logical structure is problem-oriented, and the physical structure is computer-oriented. Its basic goal is to store data and its logical relationships in the computer's memory.
program
algorithm
data structure
algorithm
A description of the steps to solve a specific problem, represented in a computer as a finite sequence of instructions, with each instruction representing one or more operations
Algorithm characteristics
enter
The algorithm has zero or more inputs
output
There is at least one or more outputs
Finiteness
The algorithm ends automatically after executing a limited number of steps without infinite loops, and one step is completed within an acceptable time
certainty
Each step of the algorithm has a definite meaning and there will be no ambiguity.
feasibility
Each step of the algorithm must be feasible, that is, each step can be completed by executing a finite number of times
Design algorithm requirements
correctness
readability
Robustness
High time efficiency
and low space usage
simplicity
algorithm complexity
time complexity
space complexity
Classification of Algorithmic Analysis
average situation
Expected number of runs for any input size
worst case scenario
Maximum number of runs for any input size
best case scenario
Minimum number of runs for any input size, usually the best case does not occur
Time complexity--O asymptotic notation
General algorithm O(n) calculation method
Replace all additive constants in runtime with constant 1
In the modified number of runs function, only the highest order terms are retained
If the coefficient of the highest-order term exists and is not 1, remove the constant multiplied by this term
Time complexity calculation of divide and conquer algorithm
The time complexity of the binary search algorithm is lgN
The time complexity of the M-point search algorithm is logM^N
Time complexity calculation of recursive algorithm
Total number of recursions * number of recursions per time
Recursive Algorithm Space Complexity Algorithm
N*space size for each recursion
recursion
recursive definition
An object is said to be recursive if it partially contains itself or defines itself by itself.
recursive process
A procedure calls itself directly or indirectly
recursive thinking
Break the problem into smaller problems that have the same solution as the original problem
recursive condition
Reduce the size of the problem so that the new problem has the same solution as the original problem
Set recursive exit
recursive classification
data structure recursion
problem solving recursion
Recursive call stack
tail recursion
The result returned by a recursive call is always returned directly
The nature of tail recursion
Cache the result of a single calculation and pass it to the next call, which is equivalent to automatic accumulation
time complexity
Total number of recursions * number of recursions per time
Backtracking
Basic idea
maze algorithm
Advantages and Disadvantages of Recursion
advantage
Recursion simplifies the way we think when solving certain problems, and the code is more concise and easy to read.
shortcoming
The essence of recursion is to call yourself, and the cost of calling a function is very high. The system must allocate storage space for each function call, push the call point information onto the stack, and after the function call ends, the space must be released. , pop the stack and restore the breakpoint, if the complexity of the recursive solution
stack
Stack concept
A special type of linear table that only allows insertion and deletion of data from one end
Features: Last in, first out
sequence stack
The data members of the sequential stack and the book sequence table are the same. The difference is that the push and pop operations of the sequential stack are only allowed to be performed on the top of the current stack.
shared stack
An array implements two stacks
principle
Since two stacks share a space and move closer to the middle, the two ends of the array represent the bottom of the two stacks, and the top of the stack keeps moving toward the middle.
Application scenarios
The two stack space requirements have an opposite relationship, that is, one grows and the other shrinks.
chain stack
Head plug deleted
stack application
bracket matching
Reverse Polish expression
maze algorithm
queue
A special linear table that only allows data to be inserted at one end and deleted at the other end
sequential queue
Implementation method one
When the head of the queue does not move out of the queue, all elements move forward.
Implementation method two
When dequeuing, the head of the queue moves backward one position
false overflow
Overflow occurs after multiple queuing and dequeuing operations. There is still storage space but the queuing operation cannot be performed.
True overflow phenomenon
The maximum storage space is full, continue the queue operation.
circular queue
A sequential storage queue connected end to end is a circular queue.
Determining whether the queue is full or empty
Use one less storage space
The queue tail pointer plus one equals the queue head pointer, which is the condition for determining that the queue is full.
The condition for null judgment is that the tail and the head are equal
Design a mark flag
The initial flag is set to 0, the queue is successfully enqueued with flag=1, and the queue is successfully dequeued with the flag set to 0.
Team empty condition rear==front&&flag==0,
Full condition rear==front&&flag=1
Set a counter
Initially, count=0, the queue is successfully enqueued, count 1, and the queue is successfully dequeued, count-1.
Queue empty condition count==0
The conditions for the queue to be full are count>0&&rear==front or count == MaxSize
chained queue
The linked storage structure of the queue is actually a singly linked list of a linear list, except that it can only have the tail in and the head out.
priority queue
Priority queue
Those with higher priority will be put out of the queue first, and those with the same priority will follow the first-in-first-out rule.
Queue applications
producer consumer model
message queue
queue phenomenon
Network data transmission
matrix
special matrix
A matrix that has many elements with the same value or many zero elements, and the distribution of elements with the same value or zero elements has a certain pattern
Symmetric matrix
An N*N matrix, any Aij = Aji
Symmetric matrix compression storage
Since the upper and lower triangles of the symmetric matrix are the same, only half of them need to be stored.
The relationship between symmetric matrices and symmetric compressed storage
lower triangle
i>jSymmetricMatrix[i][j] == Array[i*(i 1)/2 j]
sparse matrix
M*N matrix, the number of valid values in the matrix is far smaller than the number of invalid values, and the distribution is irregular.
Sparse matrix compression storage
Store only a small number of valid values
Use {row,col,value} triples to store them at once in row priority order according to the position in the array.
Matrix inversion
Swapping rows and columns
Quick reverse
first acquaintance tree
Basic concepts of trees
A set consisting of N nodes (N>=0)
There is a special node called the root node. The root node has no predecessor nodes.
Except for the root node, the remaining nodes are divided into M (M>0) disjoint sets T1, T2...Tn, each of which is a subtree similar to a tree structure.
Trees are defined recursively
Glossary
Node
A node includes a data element and several branches (pointers (indexes)) pointing to other subtrees.
degree of node
The number of subtrees owned by the node
node with degree 0
Also called terminal node
branch node
Nodes with non-zero degree
non-terminal node
ancestor node
All nodes on the branches from the root node to this node
Descendants node
All nodes in the subtree with a node as the root node
parent node
If a node in the tree has a child node, the node is called the parent node of its child node.
child node
The root node of a subtree of a node in the tree is called the child node of that node.
Brother node
Nodes with the same parent node
degree of tree
The maximum degree of all nodes in the tree
Node level
The number of branches along the path from the root node to a node in the tree
tree depth
The maximum value of the levels of all nodes in the tree
ordered tree
Each subtree T0, T1... of the node in the tree is ordered
unordered tree
The order between the subtrees of the nodes in the tree is not important, and they can exchange positions with each other.
forest
tree m collection of trees
tree representation
Directory Structure
Collection Venn diagram
tree storage structure
parent representation
Use pointers to indicate the parent nodes of each node
advantage
It is very convenient to find the parent node operation of a node.
shortcoming
It is inconvenient to find the child nodes of a node.
child representation
Use pointers to point to the child nodes of each node
advantage
It is more convenient to find the child nodes of a node
shortcoming
It is very inconvenient to find the parent node of a node.
Parent-child representation
Use pointers to represent both the parent nodes of each node and the child nodes of each node.
child brother representation
Represents the first child node of the first node, and also indicates the next sibling node of each node.
Applications of trees
computer directory
binary tree
concept
A binary tree is a finite set of nodes. The set is either empty or consists of a root node plus two binary trees called left subtree and right subtree.
Features
Each node has at most two subtrees
The subtrees of a binary tree can be divided into left and right subtrees, and their order cannot be reversed.
full binary tree
All branch nodes have left subtrees and right subtrees, and all leaf nodes are on the same level.
complete binary tree
If the structure of a binary tree with N nodes is the same as the structure of the first N nodes of a full binary tree, it is called a complete binary tree.
Properties of binary trees,
If the level of the root node is specified to be 1, then there are at most 2^(i-1) (i>=1) nodes on the i-th level of a non-empty binary tree.
If the depth of a binary tree with only the root node is specified to be 1, then the maximum number of nodes of a binary tree with a depth of K is 2^k-1 (k>=0)
For any binary tree, if the number of leaf nodes is n0 and the number of non-leaf nodes with degree 2 is n2, then n0 = n2 1
For a complete binary tree with n nodes, if all nodes are numbered starting from 0 in order from top to bottom, left to right, then for the node with serial number i:
If i>0, then the serial number of the parent node of the node with serial number i is (i-1)/2. If i==0, then the node with serial number i is the root node and there is no parent node.
If 2i 1<n, then the left child number of the parent node with serial number i is (i-1)/2. If (2i 1)>=n, then the node with serial number i has no right child node.
If 2i 2<n, then the right child node of the node with the serial number is 2i 2. If 2i 2>=n, then the node with the serial number i has no right child node.
Binary tree storage structure
sequential storage
advantage,
Store complete binary trees, simple and space-saving
shortcoming
For general binary trees, especially single-branch trees, storage space utilization is not ideal.
chain storage
Subtopic 1
Simulation pointer (static linked list)
Basic operations of binary trees
Creation of binary tree
Binary tree traversal
prologue
mid-order
Afterword
sequence
Initialize a queue
Put the pointer of the root node into the queue
Loops when the queue is not empty
Dequeue a node
If the left subtree of the node is not empty, put the left subtree pointer of the node into the queue.
If the right subtree of the node is not empty, put the right subtree of the node into the queue.
Finish
threaded binary tree
cueing concept
Convert the binary tree into a linear sequence according to the traversal of the binary tree
Possible problems with ordinary binary trees
Recursive traversal may cause stack overflow
The non-recursive version may make the program less efficient
It is difficult to find the predecessor or successor of a node in a certain traversal form.
There are a large number of null pointer fields in the tree causing waste,
clueing process
When the left pointer of a node is empty, let the pointer point to the predecessor node of the node obtained when traversing the binary tree in a certain way.
When the right pointer of a node is null, let the pointer point to the successor node of the node obtained by traversing the binary tree according to a certain traversal method.
clue flag
The function is to distinguish whether it is a child node, a precursor or a successor.
leftThread
0
leftChild
1
leftThread
rightThread
0
rightChild
1
rightThread
clue
Pointer to the predecessor or successor node in the node
clue binary tree
Binary tree with nodes of binary tree plus clues
The process of traversing a binary tree in a certain way (pre-order, in-order, post-order) makes it called a clued binary tree. It is called the method of threading the binary tree.
heap
Heap concept
Store all elements in a one-dimensional array in the form of a complete binary tree and satisfy Ki<=K2*i 1 and Ki<=K2*i 2 (Ki>=K2*i 1 and Ki>=K2*i 2) , this heap is called the min heap (max heap)
Classification of heaps
small pile
The key code of any node is smaller than the key code of its left and right children. The key code of the node at the top of the heap is the smallest.
big pile
The key code of any node is greater than the key code of its left and right children. The key code of the node at the top of the heap is the largest.
Properties of the heap
If i=0, node i is the root node and has no parent node, otherwise the parent node of node i is (i-1)/2
If 2*i 1>n-1, then node i has no left child, otherwise the left child of node i is node 2*i 1
If 2*i 1>n-1, then node i has no left child, otherwise the left child of node i is node 2*i 2
Heap creation
Adjust from top to bottom
Heap insertion
Heap deletion
Heap applications
priority queue
Huffman tree
concept
path
path length
The tree with the smallest weighted path length is called a Huffman tree
Construct huffman tree
Construct a forest of n binary trees with only root nodes. Each binary tree has only one root node with a weight.
Repeat the following steps until there is only one tree left in F
Select the two smallest ones in the binary tree forest and construct a new binary tree as the left and right subtrees. The weight of the root node of the new binary tree is the sum of the weights of the root nodes of the left and right subtrees.
Delete these two binary trees in the original binary tree forest
Add new binary tree to binary tree forest
huffman coding
coding
In data communication, the transmitted text is often converted into a binary string composed of binary characters 0 and 1. This is called encoding.
Equal length encoding
Unequal length encoding
File compression
binary search tree
nature
If the left subtree is not empty, the values of all nodes on the left subtree are less than the value of the root node
Its right subtree is not empty, then the values of all nodes on the right subtree are greater than the value of the root node
Its left and right subtrees are also binary search trees respectively.
operate
search
If the root node is not empty
Root node key==key to be found, returns true
Root node key> Search key, search in its left subtree
Root node key<search key, search in its right subtree
Otherwise return false
insert
First check whether the node already exists. If it exists, do not insert it.
Otherwise insert the element at the found position
delete
First determine whether it is in the tree, without returning directly
There are situations
The node to be deleted has no child nodes
Delete the node directly
The node to be deleted only has the left child
Delete the node and make the parent node of the deleted node point to the left child node of the deleted node.
The node to be deleted only has the right child
Delete the node and make the parent node of the deleted node point to the right child node of the deleted node.
The node to be deleted has left and right child nodes
Find the first node in the middle order (with the smallest key code) in its right subtree, fill it with its value into the deleted node, and then deal with the deletion problem of the node.
Binary search tree performance analysis
In the worst case, the average search length is O(n). In general, the average search length is O(lgn).
AVL tree
AVL tree properties
Its left and right subtrees are both AVL trees
The absolute value of the difference between the heights of the left subtree and the right subtree (referred to as the balance factor) does not exceed 1 (-1, 0, 1)
If a tree is highly balanced, it is an AVL tree. If it has n nodes, its height can be kept at 0(lgn), and the average search complexity is O(lgn)
balanced rotation
left unirotation
right unirotation
left and right double rotation
Right and left double rotation
Insertion of AVL tree
If it is an empty tree, it will be the root node after insertion, and return true directly after insertion.
If the tree is not empty, search for the insertion position. If the key is found during the search process, the insertion fails and returns false directly.
Insert node
Update the balance factor and adjust the tree
Deletion of AVL tree
The deleted node has only the left child
The balance factor of parent is 1 or -1, and the height of parent remains unchanged, then the height of all nodes from parent to root remains unchanged and no adjustment is needed.
The deleted node has only the right child
The balance factor of parent becomes 0. Although the subtree rooted with parent is balanced, its height is reduced by 1, but the balance of the parent node needs to be checked.
The deleted node has both left and right children.
Change to delete the first node q under in-order traversal
If the balance factor of the parent node is 2 or -2, then the number of dwarfs will be shortened, the parent will be unbalanced, and a balancing rotation will be required.
The root of the higher subtree of parent is q
If the balance factor of q is 0, execute a single loop to restore parent
If the balance factor of q has the same sign as the balance factor (positive or negative) of the parent, execute a single loop to restore the parent.
If the balance factor of q has the opposite sign (positive or negative) than the balance factor (positive or negative) of the parent, perform a double rotation to restore the parent.
red black tree
concept
The red-black tree is a binary search tree. It adds a storage bit to each node to represent the color of the node, ensuring that the longest path does not exceed twice the shortest path, which is approximately balanced.
nature
Each node is either red or black
The root node of the tree is black
If a node is red, then its two child nodes are black (there are no two consecutive red nodes)
For each node, the simple paths from that node to all its descendant leaf nodes contain the same number of black nodes (the number of black nodes on each path is equal)
Each leaf node is black (leaf nodes here refer to empty nodes)
insert implementation
If the tree is empty, the new node needs to be changed to black after insertion.
The parent node of the inserted node is black and does not violate any properties. It can be inserted directly.
Situation three
Situation four
Situation five
delete
use
C STL library--map/set multimap multiset
Java library
linux kernel
some other libraries
Comparison of red-black trees and AVL trees