MindMap Gallery data structure
The most basic and important thing in today's computer field is data structure. This mind map contains the basic knowledge of data structures, such as linear structures, tree structures, graph structures, their implementation methods and common algorithms, such as sorting and searching. If you want to gain a solid foundation in data structures, this mind map will be a great helper for you. It helps you learn and understand different types of data structures and how to optimize and improve algorithms with appropriate data structures.
Edited at 2023-02-16 10:18:56One Hundred Years of Solitude is the masterpiece of Gabriel Garcia Marquez. Reading this book begins with making sense of the characters' relationships, which are centered on the Buendía family and tells the story of the family's prosperity and decline, internal relationships and political struggles, self-mixing and rebirth over the course of a hundred years.
One Hundred Years of Solitude is the masterpiece of Gabriel Garcia Marquez. Reading this book begins with making sense of the characters' relationships, which are centered on the Buendía family and tells the story of the family's prosperity and decline, internal relationships and political struggles, self-mixing and rebirth over the course of a hundred years.
Project management is the process of applying specialized knowledge, skills, tools, and methods to project activities so that the project can achieve or exceed the set needs and expectations within the constraints of limited resources. This diagram provides a comprehensive overview of the 8 components of the project management process and can be used as a generic template for direct application.
One Hundred Years of Solitude is the masterpiece of Gabriel Garcia Marquez. Reading this book begins with making sense of the characters' relationships, which are centered on the Buendía family and tells the story of the family's prosperity and decline, internal relationships and political struggles, self-mixing and rebirth over the course of a hundred years.
One Hundred Years of Solitude is the masterpiece of Gabriel Garcia Marquez. Reading this book begins with making sense of the characters' relationships, which are centered on the Buendía family and tells the story of the family's prosperity and decline, internal relationships and political struggles, self-mixing and rebirth over the course of a hundred years.
Project management is the process of applying specialized knowledge, skills, tools, and methods to project activities so that the project can achieve or exceed the set needs and expectations within the constraints of limited resources. This diagram provides a comprehensive overview of the 8 components of the project management process and can be used as a generic template for direct application.
data structure
Binary Trees and Trees and Forests
Trees and Binary Trees
How to convert a tree into a binary tree?
Both the child sibling representation of a tree and the binary linked list representation of a binary tree use two pointers.
Understand the child brother representation as a binary linked list
Manual simulation method for converting a tree into a binary tree:
① Connect the children of the same node with lines
② Delete the subtree branches of each node from left to right except the first one.
example
How to convert a binary tree into a tree?
Manual simulation method for converting a binary tree into a tree:
① Layer the binary tree from top to bottom and adjust it to the horizontal direction. (Hierarchical method: Each time you encounter the left child, it is one layer)
② Find the parent node of each layer. The method is that the node connected to the previous layer is the parent node. For example, in the bcd layer, the node in the upper layer connected to it is a, so the parent nodes of the three nodes in bcd are all a.
③ Connect each layer node to its parent node, and delete the connections between the child nodes of the parent node.
example
Forest and binary tree
Forest: A forest is a collection of m (m≥0) disjoint trees.
How to convert forest to binary tree?
Manual simulation method to convert forest to tree:
①Convert every tree in the forest into a binary tree
② Take the second tree as the right subtree of the root node of the first tree, and take the third tree as the right subtree of the root node of the second tree...and so on.
example
How to convert binary tree to forest?
Manual simulation method for converting a binary tree into a forest:
Repeatedly disconnect the right subtree pointer of the right child of the root node of the binary tree until there is no binary tree with a right child of the root node.
example
Traversing trees and forests
Preorder: visit the root node first, and then visit each subtree of the root node. Accessing subtrees also follows the order of precedence requirements.
Postorder: First visit each subtree of the root node, and then visit the root node. Accessing subtrees also follows the order of precedence requirements.
example
The pre-order traversal of a tree is equal to the pre-order traversal of its corresponding binary tree, and the post-order traversal is equal to the in-order traversal of its corresponding binary tree.
Chapter 7: Sorting
Basic knowledge of sorting
Definition: Sorting is to rearrange an originally unordered sequence into an ordered sequence.
sorting stability
If there are two elements Ri and Rj in the list to be sorted, and their corresponding keywords keyi=keyj, and Ri is in front of Rj before sorting, if after sorting using a certain sorting algorithm, Ri is still in front of Rj, then this is called The sorting algorithm is stable, otherwise it is said to be unstable.
Insertion class sorting
direct insertion sort
Direct insertion sort: first use one element as an ordered sequence, and then insert subsequent elements into the ordered sequence at appropriate positions until all elements are inserted into the ordered sequence.
The time complexity is O(n)
Direct insertion sort is stable.
half insertion sort
Half-way insertion sort separates the two operations of comparison and movement, that is, first uses half search to find the insertion position, then moves the element at once, and then inserts the element.
The time complexity of half insertion sort is O(n^2)
Stability: It is the same as the stability of direct insertion sort and is stable.
Hill sort
The basic idea of Hill sorting: Hill sorting is essentially insertion sorting, but it divides the sequence to be sorted into several subsequences, and then performs direct insertion sorting on these subsequences.
① First divide the sequence by increment 5, that is, the keywords with subscripts 0, 5, 10, 15... are divided into one group, and the keywords with subscripts 1, 6, 11, 16.. are divided into one group, and then These groups are subjected to direct insertion sorting respectively, which completes a round of Hill sorting.
②Reduce the increment (d1=n/2, di 1= [di/2], for example, 10 data sequences, the first increment d1=10/2=5, the second increment d2= [d1/2 ]= [5/2]=2, and the last increment is equal to 1), so the second round performs a similar sorting process with an increment of 2.
③The next third round, fourth round... are similar processes until the last round with an increment of 1. This is the direct insertion sort mentioned earlier.
Time complexity:... The time complexity of Hill sorting is about O(n^1.3). In the worst case, the time complexity of Hill sorting is O(n^2).
Space complexity: The space complexity of Hill sorting is O(1)
Stability: Unstable. Due to different increments, equal keywords may be divided into two direct insertion sorts for sorting, which may cause relative order changes.
Exchange sorting
Bubble Sort
Assume that the length of the list to be sorted is n. Compare the values of adjacent elements from back to front (or from front to back). If they are in reverse order (i.e. A[i-1]>A[i]), exchange them until Sequence comparison completed. We call it a bubble pass, and the result is that the smallest element is swapped to the first position of the column to be sorted. In the next bubble, the minimum element determined in the previous bubble will no longer participate in the comparison, and the sequence to be sorted will be reduced by one element. The result of each bubble will put the smallest element in the sequence at the final position of the sequence,..., this is the most All elements can be sorted by doing n-1 bubbles.
Space complexity: Storage space is opened up to store intermediate variables during exchange, so the space complexity is O(1)
time complexity
Stability: When two keywords are equal, the if judgment condition does not hold, so no data movement will occur. So it's stable.
Quick sort
Quick sort is a sorting method based on divide and conquer. Each quick sort selects any element in the sequence as the pivot (usually the first element is selected), moves all elements in the sequence that are smaller than the pivot to the front of the pivot, and moves all elements that are larger than the pivot. Get behind the pivot.
1
2
time complexity: In the best case, the time complexity is O(nlogn). The more disordered the sequence to be sorted, the higher the efficiency of the algorithm. In the worst case, the time complexity is O(n^2). The more ordered the sequence to be sorted, the lower the efficiency of the algorithm.
Space complexity: Since quick sort is recursive, a recursive work stack is needed to save the necessary information for each level of recursive calls. Its capacity should be consistent with the maximum depth of the recursive calls. The best case scenario is ⌈log2(n 1)⌉ (each partition is uniform) and the depth of the recursive tree is O(logn) In the worst case, because n-1 recursive calls are required, the depth of the stack is O(n);
Stability: Quick sort is unstable because of the existence of exchange keywords.
Select class sort
Simple selection sort
Space complexity: The additional storage space required is only the intermediate variable used when exchanging elements, so the space complexity is O(1)
time complexity: The key operation is to exchange elements. The entire algorithm consists of double loops. The outer loop goes from 0 to n-2 a total of n-2 1=n-1 times. For the i-th outer loop, the inner loop executes n-1-(i 1) 1=n-i-1 times. When i=0, the inner loop is executed n-1 times. When i=n-2, the inner loop is executed once, so it is an arithmetic sequence sum, totaling (1 n-1)(n-1) /2=n(n-1)/2, so the time complexity is O(n^2)
Stability: unstable The reason is that the exchange part will break the relative order
Heap sort
What is a heap?
The heap is a complete binary tree, and the value of any non-leaf node is not greater (or not less) than the value of its left and right child nodes.
If the value of each node is not less than the value of its left and right child nodes, it is called a big top heap.
If the value of each node is not greater than the value of its left and right child nodes, it is called a small top heap.
What is heap sort?
We know that for a heap, its root node is the maximum value (big top heap) or minimum value (small top heap) of the values of all nodes in the entire heap. So the idea of heap sorting is to adjust the unordered sequence into a heap each time, then select the value of the top element from the heap, add this value to the ordered sequence, reduce the unordered sequence by one, and then repeatedly adjust the unordered sequence until all Keywords are added to the ordered sequence.
time complexity: The total time of heap sorting can be divided into ① heap building part ② n-1 downward adjustments to the heap The time complexity of heap sort is O(n) O(nlog2n)=O(nlog2n)
Heap sort is unstable
merge sort
Assuming that the table to be sorted contains n records, it can be regarded as n ordered sub-tables, each with a length of 1, and then merged in pairs to obtain ⌈n/2⌉ ordered tables with a length of 2 or 1 ; and then merge them two by two,...and repeat this until they are merged into an ordered list of length n. This sorting method is called 2-way merge sort.
For example: 49 38 65 97 76 13 27
①First treat each keyword of the entire sequence as a separate ordered subsequence
② Merge two by two, 49 and 38 are merged into {38 49}, 65 and 97 are merged into {65 97}, 76 and 13 are merged into {13 76}, 27 has no merge object
③ Merge two by two, {38 49} and {65 97} are merged into {38 49 65 97}, {13,76} and 27 are merged into {13 27 76}
④ Merge two by two, {38 49 65 97} and {13 27 76} merge into {13 27 38 49 65 76 97}
Time complexity: O(nlog2n)
Space complexity: Because the sequence to be sorted needs to be transferred to an array, an additional storage space of size n needs to be opened, that is, the space complexity is O(n)
Stability: stable
Radix sort
Radix sorting (also called bucket sorting) is a very special sorting method. It is not sorted based on comparison, but uses the idea of multi-keyword sorting (that is, sorting based on the size of each keyword), with the help of "allocation" and "collect" operations to sort single logical keywords. Radix sorting is divided into highest digit first (MSD) sorting and lowest digit first (LSD) sorting.
Examples: 53, 3, 542, 748, 14, 214, 154, 63, 616
Supplementary digits: 053, 003, 542, 748, 014, 214, 154, 063, 616
The bucket is actually a queue, first in first out (enter from the top of the bucket and go out from the bottom)
The number of keywords is n, the number of digits of the keyword is d, for example, 748 d=3, r is the number of bases of the keyword, which is the type of data that makes up the keyword, for example, there are 10 decimal digits from 0 to 9 in total. Number, i.e. r=10
Space complexity: It is necessary to open several queues of keyword bases, so the space complexity is O(r)
Time complexity: D times of "allocation" and "collection" are required for the number of keywords. One "allocation" requires n keywords to be put into each queue, and one "collection" requires r buckets to be collected. So the time complexity of one "allocation" and one "collection" is O(n r). D times requires a time complexity of O(d(n r)).
Stability: Because it is a queue with a first-in-first-out nature, it is allocated in order when allocated, which is stable, so it remains stable during collection. That is, radix sort is a stable sorting algorithm.
external sort
It is necessary to store the records to be sorted in external memory, and then transfer the data part by part into the memory for sorting. During the sorting process, multiple exchanges between memory and external storage are required, and the results after sorting the records in the external storage file are still placed in the original file. This sorting method is called external sorting.
Memory capacity cannot accommodate large amounts of data
How to get the initial merge segment
Displacement selection sort: solve the problem of placing sort segments into memory
How to reduce the number of merges of multiple merge segments
Optimal merge tree: minimum number of merges (number of I/Os)
How to quickly obtain the smallest keyword for each m-way merge
Loser tree: Reduce the number of comparisons
Chapter 6: Search
Basic concepts of search and sequential search
Search definition: The process of finding data elements that meet certain conditions in a data set is called search
Keyword: A data item in a data element that uniquely identifies the element
Average Search Length (ASL: Average Search Length): During the search process, the length of a search refers to the number of keywords that need to be compared, and the average search length is the average number of comparisons of keywords in all search processes.
Sequential search (linear search) is mainly used to search in linear tables. Starting from one end of the lookup table, the lookup table is scanned sequentially, and the scanned keywords are compared with the value key to be found. If they are equal, the search is successful. If no equal data elements are found at the end of the scan, the search fails.
1
2
3
4
The time complexity is O(n)
half search
Algorithm idea:
First, compare the given value key with the key of the middle element in the table. If they are equal, the search is successful and the storage location of the element is returned; if they are not equal, the element to be searched can only be in the first half other than the middle element. Or in the second half. Then continue the same search within the narrowed range, and repeat this until it is found, or it is determined that the element to be found does not exist in the table, then the search is unsuccessful and a search failure message is returned.
Half search analysis
Half search decision tree
For binary search, the number of comparisons is the number of nodes passed from the root node to the node.
The height of a complete binary tree with N (N>0) nodes is [log2(N 1)] or [log2N] 1.
The time complexity is O(logn)
Block search
Block search is also called index sequence search
Block search idea:
① Determine which block the value to be found is in (halve search) ② Search for the value to be searched in the determined block (sequential search)
Block search analysis
Since block search actually performs two searches, the average search length of the entire algorithm is the sum of the average search lengths of the two searches. That is, ASL chunking = ASL halved ASL order
Binary sorting tree
A binary search tree (Binary Search Tree, also called a binary search tree) is either an empty tree or a binary tree with the following properties ①If the left subtree is not empty, then the values of all nodes on the left subtree are less than the value of its root node. ②If the right subtree is not empty, then the values of all nodes on the right subtree are greater than the value of its root node. ③ Its left and right subtrees are also a binary sorting tree.
"The left is small and the right is big"
Algorithmic thinking
Due to the characteristics of binary sorting trees (left subtree < root node < right subtree), every time you search for a keyword, you need to compare it with the root node first: If this keyword is less than the value of the root node, then the same comparison operation is performed on the left subtree of the root node and continues until the keyword is found, indicating a successful search, or a null pointer, indicating a failed search. If this keyword is greater than the value of the root node, then the same comparison operation is performed on the right subtree of the root node and continues until the keyword is found, indicating a successful search, or a null pointer, indicating a failed search.
Find keyword codes
1
2
Insert keyword code
1) Empty tree: Directly insert a new node and return success 2) The tree is not empty: check whether there are nodes with repeated keywords: ① Exists: Returns insertion failure ② Does not exist: Check the size relationship between the value of the root node and the key value to be inserted, and recursively insert the left and right subtrees.
Construct code
delete node
①The leaf nodes are deleted
Method: Just delete the node directly
② What is deleted is the node with only the left subtree or the right subtree.
Method: "Son inherits father's legacy"
③What is deleted are nodes that have both the left and right subtrees.
Following the ② type, one child will “inherit the father’s inheritance” and the other child will “submit” to this child. Method: Find the direct predecessor or direct successor node of the node to be deleted, replace the node to be deleted with this node, and then delete the node.
Binary sorting tree analysis
The search time complexity is O(n)
Balanced binary tree (AVL tree)
A balanced binary tree (AVL tree) is a special binary sorting tree. The special thing is that the absolute value of the height difference between the left and right subtrees does not exceed 1, and the left and right subtrees are also a balanced binary tree.
balancing factor
Define the height difference between the left subtree and the right subtree of a node as the balance factor of the node. Then the value of the balance factor of a balanced binary tree node can only be −1, 0 or 1.
Balance adjustment
The process of establishing a balanced binary tree is similar to that of a binary sorted tree. They both start from an empty tree and insert nodes one after another. The difference is that in the process of establishing a balanced binary tree, since inserting a node may destroy the balance of the node, balance adjustment is required.
LL adjustment (caused by inserting a node into the left subtree of the left child)
The balance factor of the root node of the minimum unbalanced subtree is 2>0 The balance factor of its left child node is 1>0 Both are greater than 0, so you can adjust it by turning right
"Regular right-hand rotation"
RR adjustment (caused by inserting a node into the right subtree of the right child)
The balance factor of the root node of the minimum unbalanced subtree is -2<0 Its right child node balance factor is -1<0 Both are less than 0, so you can adjust it by turning left
"Negative means left-handed rotation"
LR adjustment (caused by inserting a node into the right subtree of the left child)
RL adjustment (caused by inserting a node into the left subtree of the right child)
First locally convert to LL or RR, and finally adjust
analyze
The maximum depth of a balanced binary tree with n nodes is O(log2n). Therefore, the average search length of a balanced binary tree is O(log2n).
B-tree and B-tree
2-3 trees
The 2-3 tree is a multi-way search tree: 2 and 3 mean that the 2-3 tree contains two types of nodes
1) 2 nodes contain one element and two children (or no children). ①The elements contained in the left subtree are less than the element value of the node, and the elements contained in the right subtree are greater than the element value of the node. ②Node 2 either has two children or has no children. One child is not allowed.
2) The 3-node contains two elements, one large and one small, and three children (or no children). (The two elements are arranged in order of size) ①The left subtree contains elements smaller than the smaller element value of the node, the right subtree contains elements greater than the larger element value of the node, and the middle subtree contains elements between these two element values. ②Node 3 either has three children or no children. One or two children are not allowed.
3) All leaf nodes of the 2-3 tree are at the same level
2-3-4 tree
The 2-3-4 tree is also a multi-way search tree: 2 and 3 and 4 mean that the 2-3-4 tree contains three types of nodes
1) 2 nodes contain one element and two children (or no children). ①The elements contained in the left subtree are less than the element value of the node, and the elements contained in the right subtree are greater than the element value of the node. ②Node 2 either has two children or has no children. One child is not allowed.
2) The 3-node contains two elements, one large and one small, and three children (or no children). ①The left subtree contains elements smaller than the smaller element value of the node, the right subtree contains elements greater than the larger element value of the node, and the middle subtree contains elements between these two element values. ②Node 3 either has three children or no children. One or two children are not allowed.
3) The 4-node contains three elements, small, medium and large, and four children (or no children). ①The left subtree contains elements less than the smallest element value of the node, the second subtree contains elements greater than the smallest element value and less than the middle element value, the third subtree contains elements greater than the middle element value and less than the maximum element value, right The subtree contains elements greater than the largest element value of the node. ②Node 4 either has four children or no children. One, two or three children are not allowed.
4) All leaf nodes of the 2-3-4 tree are at the same level
B-tree
The B-tree is also a balanced multi-path search tree. The 2-3 tree and the 2-3-4 tree are both special cases of the B-tree. We call the largest number of children of a node in the tree the order of the B-tree. Usually written as m. A B-tree of order m is either an empty tree or an m-ary tree that satisfies the following characteristics:
1) Each node in the tree has at most m subtrees. (That is, it contains at most m-1 keywords) ("Two subtree pointers sandwich a keyword")
2) If the root node is not a terminal node, there are at least two subtrees. (at least one keyword)
3) All non-leaf nodes except the root node have at least ⌈m/2⌉ subtrees. (That is, it contains at least ⌈m/2⌉-1 keywords)
4) The structure of all non-leaf nodes is as follows:
5) All leaf nodes appear at the same level without information. (Just like the node where the search failed in the binary search judgment tree)
1. Search operation of B-tree
Search process: ① First compare the keyword key to be found with the keywords in the node. If it is equal to one of the keywords, the search is successful. ② If it is not equal to all keywords, check which range the key is in, and then search in the subtree pointed to by the corresponding pointer. Eg: If Key is smaller than the first keyword K1, search in the subtree pointed by the P0 pointer. If it is larger than the last keyword Kn, search in the subtree pointed by the Pn pointer.
2. Insertion operation of B-tree
Splitting method: take the middle keyword (⌈n/2⌉) in this keyword array as the new node, and then use the other keywords to form two nodes as the left and right children of the new node.
3.Deletion operation of B-tree
The deletion operation in the B-tree is similar to the insertion operation, but it is slightly more complicated. It must make the number of keywords in the deleted node ≥⌈m/2⌉-1, so it will involve the "merging" problem of nodes. Due to the different positions of deleted keywords, it can be divided into two situations: the keyword is on the terminal node and the keyword is not on the terminal node.
1) If the deleted keyword is on the terminal node (the lowest non-leaf node): ①The number of keywords in the node is greater than ⌈m/2⌉-1. In this case, deleting this keyword will not destroy the definition requirements of the B-tree. So delete it directly. ②If the number of keywords in a node is equal to ⌈m/2⌉-1, and there are nodes with a number of keywords greater than ⌈m/2⌉-1 in its left and right sibling nodes, then borrow keywords from the sibling stage. ③If the number of keywords in a node is equal to ⌈m/2⌉-1, and there is no node with a number of keywords greater than ⌈m/2⌉-1 in its left and right sibling nodes, node merging is required.
2) If the deleted keyword is not on the terminal node (the lowest level non-leaf node): it needs to be converted to be on the terminal node first, and then the corresponding methods are considered according to the situation on the terminal node.
Adjacent keywords: For a keyword that is not on a terminal node, its adjacent keyword is the keyword with the largest value in its left subtree or the keyword with the smallest value in its right subtree.
The first case: There is a left subtree or right subtree with a number of keywords greater than ⌈m/2⌉-1. Find the adjacent keywords of the keyword on the corresponding subtree, and then replace the adjacent keywords to be deleted. keyword.
The second case: the number of keywords in the left and right subtrees is equal to ⌈m/2⌉-1, then the two left and right subtree nodes are merged, and then the keywords to be deleted are deleted.
B-tree
B-tree is a data structure commonly used in databases and operating system file systems for search.
The main difference between the m-order B-tree and the m-order B-tree is: 1) In the B-tree, a node with n keywords only contains n subtrees, that is, each keyword corresponds to a subtree; while in the B-tree, a node with n keywords contains (n 1) A tree. 2) In the B-tree, the range of the number n of keywords in each node (non-root internal node) is ⌈m/2⌉≤n≤m (root node 1≤n≤m). In the B-tree , the range of the number n of keywords for each node (non-root internal node) is ⌈m/2⌉ -1≤n≤m-1 (root node: 1≤n≤m-1). 3) In the B-tree, leaf nodes contain information, and all non-leaf nodes only serve as indexes. Each index item in a non-leaf node only contains the maximum keyword of the corresponding subtree and a pointer to the subtree. , does not contain the storage address of the record corresponding to the keyword. 4) In the B-tree, leaf nodes contain all keywords, that is, keywords that appear in non-leaf nodes will also appear in leaf nodes; in B-tree, leaf nodes contain keywords and The keywords contained in other nodes are not repeated.
hash table
Hash table: A data structure that calculates the address of the keyword in the table based on a given keyword. In other words, the hash table establishes a direct mapping relationship between keywords and storage addresses.
Hash function: A function that maps keywords in the lookup table to the address corresponding to the keyword, recorded as Hash(key)=Addr.
The hash function may map two or more different keywords to the same address, which is called a "collision". These different keywords that collide are called synonyms.
Tips for constructing hash functions:
1) The domain of the hash function must include all the keywords that need to be stored, and the range of the value domain depends on the size or address range of the hash table.
2) The addresses calculated by the hash function should be evenly distributed throughout the address space with equal probability, thereby reducing the occurrence of conflicts.
3) The hash function should be as simple as possible and can calculate the hash address corresponding to any keyword in a short time.
1. Construction methods of commonly used Hash functions:
1. Open addressing method: directly take a linear function value of the keyword as the hash address, and the hash function is H(key)=a×key b. In the formula, a and b are constants. This method is the simplest to calculate and will not cause conflicts
2. Division with remainder method: Assume that the length of the hash table is m, take a prime number p that is no greater than m but closest to or equal to m, and use the following formula to convert the keyword into a hash address. The hash function is H(key)=key % p The key to the remainder method is to choose p so that each keyword can be mapped to any address in the hash space with equal probability after being converted by this function, thereby reducing the possibility of conflicts as much as possible.
3. Digital analysis method: Suppose the keyword is an r-base number (such as a decimal number), and the frequency of occurrence of r numbers in each position is not necessarily the same. It may be more evenly distributed in some positions. The chance of each number appearing If some bits are unevenly distributed, and only certain numbers appear frequently, then some bits with a more even distribution of numbers should be selected as the hash address. This method is suitable for known keyword sets
4. Square middle method: As the name suggests, the middle digits of the square value of the keyword are taken as the hash address. The specific number of bits to take depends on the actual situation. The hash address obtained by this method is related to each bit of the keyword, making the hash address distribution more even.
5. Folding method: Divide the keyword into several parts with the same number of digits (the number of digits in the last part can be shorter), and then take the superposition sum of these parts as the hash address. This method is called the folding method. When there are many digits in the keyword, and the numbers on each digit in the keyword are roughly evenly distributed, the folding method can be used to obtain the hash address.
2. Conflict handling methods for commonly used Hash functions:
1. Open addressing method: Use the conflicting Hash address as an independent variable, and obtain a new free Hash address through a certain conflict resolution function.
1) Linear detection method: When a conflict occurs, check the next unit in the table sequentially (when the end address m-1 of the table is detected, the next detection address is the address 0 at the beginning of the table) until an idle unit is found (when the table is not A free unit must be found when it is full) or the entire table can be searched.
2) Square detection method: Suppose the conflicting address is d, and the new address sequence obtained by the square detection method is d 12, d-12, d 22, d-22... The square detection method is a better way to deal with conflicts and can avoid the "stack-up" problem. Its disadvantage is that it cannot detect all cells on the hash table, but it can detect at least half of the cells.
3) Rehash method: also called double hashing method. Two hash functions need to be used. When the address obtained through the first hash function H (Key) conflicts, the second hash function Hash2 (Key) is used to calculate the address increment of the key.
4) Pseudo-random sequence method: When an address conflict occurs, the address increment is a pseudo-random number sequence, which is called the pseudo-random sequence method.
2. Zipper method: Different keywords may be mapped to the same address through a hash function. In order to avoid conflicts between non-synonyms, all synonyms can be stored in a linear linked list. This linear linked list is uniquely identified by its hash address. . The zipper method is suitable for situations where insertions and removals are frequent.
3. The search process of the hash table: similar to constructing a hash table, given a keyword Key. First calculate its hash address based on the hash function. Then check if there is a keyword at the hash address location. 1) If not, it means that the keyword does not exist, and the search failure is returned. 2) If there is, check whether the record is equal to the keyword. ①If equal to the keyword, return the search success. ② If not equal, calculate the next hash address according to the given conflict handling method, and then use this address to perform the above process.
4. The search performance of the hash table: related to the filling factor.
The larger α is, the more "full" the filled records are, and the greater the possibility of conflict. On the contrary, the smaller the possibility of conflict.
Chapter 5: Picture
Basic concepts of graphs
definition: A tree is a finite set of N (N≥0) nodes. When N=0, it is called an empty tree, which is a special case. In any non-empty tree it should satisfy: 1) There is and is only one specific node called the root. 2) When N>1, the remaining nodes can be divided into m (m>0) disjoint finite sets T1, T2,..., Tm, each of which is itself a tree and is called the root. The subtree of the node.
The graph G consists of a vertex set V and an edge set E, denoted as G = (V, E)
V(G) represents a finite non-empty set of vertices in graph G. Use |V| to represent the number of vertices in graph G, also called the order of graph G.
E(G) represents the set of relationships (edges) between vertices in graph G. Let |E| represent the number of edges in graph G.
Classification
directed graph
A finite set of directed edges (arcs)
An arc is an ordered pair of vertices
v is the tail of the arc, w is the head of the arc
v is adjacent to w or w is adjacent to v
Undirected graph
Finite set of undirected edges
Edges are unordered pairs of vertices
(v,w)
(v,w)=(w,v)
w and v are adjacent points to each other
simple diagram
1. There is no edge from the vertex to itself.
2. The same edge does not appear repeatedly
multiple graphs
If there is more than one edge between two nodes in graph G, the vertices are allowed to be associated with themselves through the same edge.
complete graph
undirected complete graph
If there is an edge between any two vertices
Directed complete graph
If there are two arcs in opposite directions between any two vertices
subplot
Connected graph: any two vertices in the graph are connected
Connected components: maximal connected subgraphs in undirected graphs
connected
There is a path from vertex A to vertex B
great
1. There are enough vertices
2. A maximal connected subgraph contains all the edges attached to these vertices.
How to find connected components: Start by selecting a vertex, use this vertex as a subgraph, and then add vertices and edges connected to this subgraph one by one until all connected vertices are added to the subgraph.
Conclusion 1: If a graph has n vertices and less than n-1 edges, then the graph must be a disconnected graph.
Strongly connected: There are paths from vertex V to vertex W and from vertex W to vertex V.
Strongly connected graph: any pair of vertices in the graph is strongly connected
Spanning tree of a connected graph: a minimal connected subgraph that contains all n vertices in the graph but only n-1 edges.
Conclusion 2: Removing an edge from a spanning tree will result in a non-connected graph, and adding an edge will form a cycle.
Degree: the number of edges with this vertex as an endpoint
The degree of a vertex V in an undirected graph refers to the number of edges attached to the vertex, recorded as TD(v)
The degree of vertex V in a directed graph is divided into out-degree and in-degree
Indegree (ID) is the number of directed edges ending at vertex v
Out-degree (OD) is the number of directed edges starting from vertex V
Simple path and simple circuit: A path whose vertices do not appear repeatedly is called a simple path. For a loop, a loop in which the vertices do not appear repeatedly except for the first and last vertex is called a simple loop.
Quanhe Net: Each edge in the graph is given a certain value by postgraduate entrance examination. This value is called the weight of this edge. The weighted graph is called a weighted graph, also called a network.
Path and path length: The path between vertices p to q refers to the vertex sequence that is stored, p, a, b, c, d,...q. The number above the path is the path length
Loop (ring): A path whose first and last vertices are the same is called a loop or loop
Distance: The length of the shortest path from vertex u to v. If there is no path, it is infinite.
Graph storage structure
Adjacency matrix (stored sequentially)
Adjacency list (linked storage)
Cross linked list (directed graph)
Adjacency multiple list (undirected graph)
Graph traversal
depth first traversal
Depth-First Search (DFS: Depth-First-Search): Depth-first search is similar to the pre-order traversal algorithm of a tree
Space complexity: Since DFS is a recursive algorithm, recursion requires a working stack to assist the work. At most, all vertices in the graph need to be pushed into the stack, so the time complexity is O(|V|)
Time complexity: 1) Adjacency list: The main operation of the traversal process is to traverse its adjacent points for a vertex. Since the adjacent points are found by accessing the edge table, the time complexity is O(|E|), and the time to access the vertices is O. (|V|), so the total time complexity is O(|V| |E|) 2) Adjacency matrix: The time complexity of finding the adjacent points of each vertex is O(|V|), and each vertex is searched, so the total time complexity is O(|V|2)
breadth-first traversal
Breadth-First Search (BFS: Breadth-First-Search): Breadth-First Search is similar to the level-order traversal algorithm of a tree
Space complexity: BFS needs to use a queue, and n vertices need to be queued once. So in the worst case, if n vertices are in the queue, then a space complexity of O(|V|) is required.
time complexity: 1) Adjacency list: Each vertex is entered into the queue once, and the time complexity is O(|V|). For each vertex, searching for its adjacent points requires visiting all the edges of this vertex, so the time complexity is O( |E|). So the total time complexity is O(|V| |E|) 2) Adjacency matrix: Each vertex is entered into the queue once, and the time complexity is O(|V|). For each vertex, searching for its adjacent points requires traversing one row of the matrix, so the time complexity is O(|V |), so the total time complexity is O(|V|2)
Application of diagrams
minimum spanning tree
Prlm
① Find the first starting vertex v0 in the graph as the first vertex of the spanning tree, and then select the edge with the smallest weight from all the edges from this vertex to other vertices. Then add the other vertex v of this edge and this edge to the spanning tree.
② For all other remaining vertices, check whether the weights of these vertices and vertex v are smaller than the corresponding weights of these vertices in the lowcost array. If smaller, update the lowcost array with smaller weights.
③Continue to select the edge with the smallest weight and not in the spanning tree from the updated lowcost array, and then add it to the spanning tree.
④ Repeat ②③ until all vertices are added to the spanning tree.
In a double loop, the number of times of the outer loop is n-1, and the number of times of the two inner loops in parallel is n. Therefore, the time complexity of Prim's algorithm is O(n2) Moreover, the time complexity is only related to n, so it is suitable for dense graphs.
Kruskal
Arrange the edges in the graph according to their weights from small to large, then scan from the smallest edge, and set a set of edges to record. If the edge does not form a loop, the edge will be merged into the current spanning tree. Until all edges have been detected.
The operation of Kruskal's algorithm is divided into the weight sorting part of the edges and a single for loop. They are in a parallel relationship. Since the sorting takes longer than the single loop, the main time of Kruskal's algorithm is spent on sorting. Sorting is related to the number of edges in the graph, so it is suitable for sparse graphs
shortest path
Dijkstra
The shortest path from a source point to other vertices
This algorithm sets a set S to record the vertices of the shortest path that has been obtained. It can be implemented by an array s[], initialized to 0. When s[vi]=1, it means that the vertex vi is put into S. Initially, the source Point v0 and put it into S. In addition, two auxiliary arrays are set during construction: dist[]: records the current shortest path length from the source point v0 to other vertices. The initial value of dist[i] is arcs[v0][i]. path[]: path[i] represents the predecessor node of the shortest path from the source point to vertex i. At the end of the algorithm, the shortest path from source point v0 to vertex vi can be traced back based on its value. Assume starting from vertex 0, that is, vertex 0 is the source point. The set S initially only contains vertex 0. The adjacency matrix arcs represents the weighted directed graph, and arcs[i][j] represents the weight of the directed edge <i,j> Value, if there is no directed edge <i, j>, then arcs[i][j] is ∞. The steps of Dijkstra's algorithm are as follows: 1) Initialization: The set S is initially {0}, the initial value of dist[] is dist[i]=arcs[0][i], i=1, 2,...,n-1. 2) Find the minimum value dist[j] in dist[], and add vertex j to the set S, that is, modify s[vj]=1. 3) Modify the length of the shortest path from v0 to any vertex vk on the set V-S: if dist[j] arcs[j][k]< dist[k], then let dist[k]=dist[j] arcs [j][k]. In addition, update path[k]=j (that is, after vertex j is added to the set, if there is a new path that makes the path to vertex k shorter, the length of the path to vertex k will be modified to a shorter one) 4) Repeat operations 2) to 3) a total of n-1 times until all vertices are included in S.
Freud
The shortest path from all vertices to all vertices
Algorithmic idea: Recursion produces an n-order square matrix sequence A(−1), A(0),…,A(k),…,A(n−1) Among them, A(k)[i][j] represents the path length from vertex vi to vertex vj, and k represents the operation step of bypassing the k-th vertex. Initially, for any two vertices vi and vj, if there is an edge between them, the weight on this edge is used as the shortest path length between them; if there is no directed edge between them, ∞ is used as The shortest path length between them. In the future, we will gradually try to add vertex k (k=0, 1, ..., n-1) as an intermediate vertex in the original path. If after adding intermediate vertices, the resulting path is shorter than the original path, replace the original path with this new path.
Unweighted graph
The path with the fewest number of edges between two points
weighted graph
The path between two points that has the smallest sum of edge weights
topological sort
AOV
If we regard each link as a vertex in the graph, in such a directed graph, using vertices to represent activities and arcs to represent the priority relationships between activities, then such a directed graph is called an AOV network (Activity On Vertex). )
Topological sorting is the process of constructing a topological sequence for a directed graph. The construction will have two results: If all the vertices of this graph are output, it means that it is an AOV network without loops; If all vertices are not output, it means that there is a loop in this graph and it is not an AOV network.
Topological sorting algorithm: Select a vertex with an in-degree of 0 from the AOV network to output, then delete this vertex, and delete the arc with this vertex as the tail of the arc. Repeat this step until all vertices in the output graph are output, or no vertices with in-degree 0 are found.
Critical Path
AOE (Activity On Edge): In a weighted directed graph representing a project, vertices are used to represent events, directed edges are used to represent activities, and weights on the edges are used to represent the duration of activities. The edges of this directed graph The net representing the activity is called the AOE net.
Chapter 1: Data Structure basic concept
definition
In any problem, data elements do not exist in isolation, but there is a certain relationship between them. This relationship between data elements is called structure. A data structure is a collection of data elements that have one or more specific relationships with each other. Data structure includes three aspects: logical structure, storage structure and data operation. The logical structure and storage structure of data are two inseparable aspects. The design of an algorithm depends on the selected logical structure, and the implementation of the algorithm depends on the storage structure used.
logical structure
Logical structure refers to the logical relationship between data elements, that is, describing data in terms of logical relationships. It has nothing to do with data storage and is independent of the computer
The logical structure of data is divided into linear structure and non-linear structure
There is no other relationship between the data elements in the set structure other than the relationship of "belonging to the same set". Similar to a mathematical set
Linear structure There is only a one-to-one relationship between the data elements in the structure. Such as queuing
Tree structure There is a one-to-many relationship between the data elements in the structure. Such as family genealogy
Graph structure or network structure There is a many-to-many relationship between the data elements in the structure. Such as map
physical structure
Storage structure refers to the representation of data structure in the computer (also called image), also called physical structure. It includes the representation of data elements and the representation of relationships. The storage structure of data is the realization of logical structure in computer language, which depends on computer language. The storage structures of data mainly include: sequential storage, chain storage, index storage and hash storage.
Sequential storage: The physical locations of storage are adjacent. (p.s. The physical location is the location of the information in the computer.)
Linked storage: The physical locations of storage may not be adjacent. Adjacent elements are found by recording the physical locations of adjacent elements.
Index storage: similar to a directory, you can contact the file system chapter of the operating system to understand it later.
Hash storage: directly calculate the physical address of the element through keywords (explained in detail later).
Five characteristics of algorithms
1. Finiteness: ends after a finite number of steps
2. Certainty: there is no ambiguity, that is, there is no ambiguity
3. Feasibility: For example, limited by the computing power of computers, some algorithms, although theoretically feasible, cannot be completed in practice.
4. Input: Various types of data that can be processed by the computer, such as numbers, audio, images, etc.
5. Output: one or more program output results.
algorithm complexity
time complexity:
• It is used to measure how quickly the algorithm execution time increases as the problem size increases;
• Is a function of problem size: T(n) is a function of time scale. Time complexity mainly analyzes the order of magnitude of T(n)
• T(n)=O(f(n)) f(n) is the frequency of basic operations in the algorithm. Generally, we consider the time complexity in the worst case
Space complexity:
• It is used to measure the speed of the space required by the algorithm as the size of the problem increases;
• Is a function of problem size: S(n)=O(g(n)); the growth rate of the space required by the algorithm is the same as the growth rate of g(n).
Focus on complexity calculation
Commonly used time complexity relationships:
How to calculate complexity
Time complexity calculation (single loop body)
Directly focus on the number of executions of the loop body, set to k
Time complexity calculation (multiple loop bodies)
Two operation rules: multiplication rule and addition rule.
Chapter 2: Linear table
The logical structure of a linear table
Definition: A linear table is a finite sequence of n (n≥0) data elements of the same data type. Where n is the table length. When n=0, the linear table is an empty table
Features: The first element in the linear list is called the header element; the last element is called the tail element. Except for the first element, every element has exactly one direct predecessor. Every element except the last element has exactly one direct successor.
Sequential storage structure of linear table
The sequential storage of linear tables is also called sequential tables. It uses a set of storage units with consecutive addresses (such as arrays in C language) to store data elements in a linear table in sequence, thus making the logic Two elements that are editorially adjacent are also physically adjacent.
Three attributes to create a sequence table: 1. The starting position of the storage space (array name data) 2. Maximum storage capacity of sequence table (MaxSize) 3. The current length of the sequence table (length)
In fact, arrays can also dynamically allocate space. The space to store arrays is allocated through dynamic storage allocation statements during program execution.
Summarize:
1. The main feature of the sequence table is random access (based on arrays in C language), that is, the specified element can be found in O(1) time through the first address and element serial number.
2. The storage density of the sequence table is high, and each node only stores data elements. There is no need to spend space on the elements in the table to establish logical relationships between them (because of the adjacent physical location characteristics)
3. Logically adjacent elements in the sequence table are also physically adjacent, so insertion and deletion operations require moving a large number of elements.
Sequence table operations
1.Insert
Algorithm idea:
1. Determine whether the value of i is correct
2. Determine whether the table length exceeds the array length
3. From back to front to the i-th position, move each of these elements backward one position.
4. Insert the element into position i and modify the table length
code
analyze:
Best case: inserting at the end of the table (ie i=n 1), the element move statement will not be executed, and the time complexity is O(1).
Worst case scenario: inserting at the head of the table (ie i=1), the element move statement will be executed n times, the time complexity is O(n).
Average case: Assume that pi (pi=1/(n 1)) is inserted at the i-th position The probability of a node, then insert a node into a linear list of length n The average number of times a node needs to be moved is
2.Delete
Algorithm idea:
1. Determine whether the value of i is correct
2. Get the deleted element
3. Move all elements after the deleted element forward one position in sequence.
4. Modify table length
code
analyze
Best case scenario: Delete the element at the end of the table (ie i=n) without moving the element, and the time complexity is O(1).
Worst case scenario: deleting the header element (that is, i=1) requires moving all elements except the first element, and the time complexity is O(n).
Average situation: Assuming that pi (pi=1/n) is the probability of deleting the node at the i-th position, then the average number of times the node needs to be moved when deleting a node in a linear list of length n is
Linked storage structure of linear table
Linked storage of linear tables refers to storing data elements in linear tables through a set of arbitrary storage units.
What is the difference between head node and head pointer?
Regardless of whether there is a head node or not, the head pointer always points to the first node of the linked list, and the head node is the first node in the linked list of headed nodes. No information is usually stored in the node.
Why set the header node?
1. It is easy to process and operate. For example, the operations of inserting a node before the first element node and deleting the first node are unified with the operations of other nodes.
2. Regardless of whether the linked list is empty or not, its head pointer is a non-null pointer pointing to the head node, so the processing of empty lists and non-empty lists is unified.
Single linked list operations
1. Create a singly linked list using head insertion method:
Create a new node, allocate memory space, and insert the new node into the head of the current linked list.
code
2. Create a singly linked list using tail insertion method:
Create a new node, allocate memory space, and insert the new node into the end of the current linked list.
code
3. Find nodes by serial number
Starting from the first node in the singly linked list, search down the next field one by one until the i-th node is found, otherwise NULL is returned in the last node pointer field.
code
4. Find nodes by value
Starting from the first node of the singly linked list, compare the values of the data fields of each node in the list from front to back. If the value of the data field of a node is equal to the given value e, then return the pointer of the node; if the entire single linked list If there is no such node in the linked list, NULL is returned.
code
5. insert
The insertion operation is to insert a new node with value x into the i-th position of the singly linked list. First check the legality of the insertion position, then find the predecessor node of the position to be inserted, that is, the i-1th node, and then insert the new node after it.
Algorithm idea: 1. Get the pointer to the predecessor node of the insertion position ① p=GetElem(L,i-1); 2. Let the pointer field of the new node *s point to the successor node of *p ② s->next=p->next; 3. Let the pointer field of node *p point to the newly inserted node *s ③ p->next=s;
6. delete
The deletion operation is to delete the i-th node of the singly linked list. First check the legality of the deleted position, then look up the i-1th node in the table, which is the predecessor node of the deleted node, and then delete it.
Algorithm idea: 1. Get the pointer to the predecessor node of the deleted position p=GetElem(L,i-1); 2. Get the pointer to the deleted position q=p->next; 3. The successor of the node pointed to by p points to the successor of the deleted node p->next=q->next 4. Release and delete the node free(q);
Double linked list
definition
1. Insert: (method is not unique) ① s->next=p->next; ② p->next->prior=s; ③ s->prior=p; ④ p->next=s;
2. Delete: ① p->next=q->next; ② q->next->prior=p; ③ free(q);
Circular linked list && static linked list
Circular singly linked list: The difference between a circular singly linked list and a singly linked list is that the pointer of the last node in the list is not NULL, but instead points to the head node, so that the entire linked list forms a ring.
Cyclic doubly linked list: Analogous to cyclic singly linked list, cyclic doubly linked list differs from doubly linked list in that the head and tail nodes form a ring.
When the circular doubly linked list is an empty list, the prior field and next field of its head node are both equal to Head.
Static linked list: Static linked list is a linked storage structure that uses an array to describe a linear list.
The first element of the array does not store data, its pointer field stores the array index where the first element is located. The pointer field value of the last element of the linked list is -1.
example
Chapter 3: Stacks and Queues
stack
Stack: A linear list that only allows insertion or deletion operations at one end.
Top of the stack (Top): The end of the linear list that allows insertion and deletion.
Bottom: Fixed, the other end that does not allow insertion and deletion
Features: 1. The stack is a restricted linear list, so it naturally has a linear relationship Tie. 2. The elements in the stack that go in later must come out first, that is, last in, first out. LIFO (Last In First Out)
Elements in the stack are advanced backward Those who go must leave first Come, last in, first in Out LIFO (Last In First Out)
sequence stack
The stack is a special case of linear tables, and the sequential storage of the stack is also a simplification of the sequential storage of linear tables. The sequential storage structure of the stack is also called a sequential stack.
Sequential stack operations
1. Short judgment:
2. Push into the stack:
3. Pop out of the stack:
4. Read the top element of the stack:
shared stack
The storage space size of the sequential stack needs to be allocated in advance. In many cases, the utilization rate of separately opening the storage space for each stack is not as good as sharing the storage space of each stack.
Schematic diagram
Shared stack structure
Shared stack operations: (Push into the stack)
chain stack
The stack is a special case of a linear list. The storage structure of a linear list also has a linked storage structure, so the stack can also be implemented in the form of a linked list. The chain storage structure of the stack is also called a chain stack.
Features 1. The chain stack generally does not become full. 2. The judgment condition for an empty stack is usually set to top==NULL;
structure
Chain stack operations
1. Push into the stack
2. Pop out of the stack
queue
A queue is a linear list that only allows insertions at one end and deletions at the other end.
Front: The end that allows deletion, also known as the front.
Rear: The end that allows insertion.
The element that enters the queue first must leave the queue first, that is, First In First Out (FIFO).
sequential queue
To implement a queue using an array, you can place the head of the queue at the array index 0.
circular queue
"Bend" the array to form a ring. When the rear pointer reaches the position of subscript 4, it can continue to point back to the position of subscript 0. A queue that is stored end to end in this order is called a circular queue.
Enqueue: rear=(rear 1)%MaxSize
Dequeue: front=(front 1)%MaxSize
So how to tell whether the queue is empty or full?
Method 1: Set the flag flag. When flag=0 and rear is equal to front, the queue is empty. When flag=1 and rear is equal to front, the queue is full.
Method 2: We use front=rear only as the judgment condition for the team to be empty. When the queue is full, there is still one free unit in the array. We think this situation is when the queue is full.
Circular queue operations
1. Join the team:
2. Depart the team:
chained queue
The linked storage structure of the queue is actually a singly linked list of a linear list, but it needs to be restricted. Elements can only be inserted at the end of the table and deleted from the head.
In order to facilitate the operation, we set the team head pointer and the team tail pointer respectively. The team head pointer points to the head node and the team tail pointer points to the tail node.
Chained queue operations
1. Entering the queue: We know that the queue can only insert elements from the end of the queue and delete elements from the head of the queue. So joining the queue means inserting the node at the end pointer of the queue. The insertion operation of the chain queue is the same as the insertion operation of the singly linked list.
2. Dequeue: Dequeue is to dequeue the successor node of the head node, and then change the successor of the head node to the node behind it.
deque
A double-ended queue refers to a queue that allows both ends to perform enqueue and dequeue operations.
stack application
1. Bracket matching: Suppose there are two kinds of brackets, one round () and one square []. The nesting order is arbitrary.
Algorithm idea: If it is a left bracket, push it onto the stack; if it is a right bracket, pop a left bracket out of the stack to determine whether it matches; when the end of the string is detected, check whether the stack is empty. As long as the stack is empty, the entire string will be bracketed.
code
2. Expression evaluation:
Rules: Scan each number and symbol of the expression from left to right. When a number is encountered, it is pushed onto the stack. When a symbol is encountered, the two numbers at the top of the stack are popped out of the stack and then the operation is performed on the symbol. Finally, the operation result is pushed onto the stack. , until the final result is obtained.
How to convert an infix expression into a postfix expression?
1. Add brackets to all operators and their operands according to operator precedence. (No need to add original brackets)
2. Move the operator after the corresponding parentheses.
3. Remove the parentheses.
example
3. Recursion:
To understand recursion, you must first understand recursion until you can understand recursion. If itself is used in the definition of a function, procedure, or data structure, then the function, procedure, or data structure is said to be defined recursively, or recursively for short. The most important thing about recursion is the recursion formula and the recursion boundary.
1. Factorial
Time complexity: O(NlogN)
2. Fibonacci Sequence
Time complexity O(2^n)
Chapter 4: Tree
Basic concepts of trees
A tree is a recursively defined structure
Node
Root node: The tree has only one root node
Degree of node: the number of subtrees owned by the node
Degree is 0: leaf node or terminal node
Degree is not 0: branch node or non-terminal node
Branch nodes are also called internal nodes except the root node.
Degree of tree: the maximum degree of all nodes in the tree
Node relationship
ancestor node
Any node with the unique path from the root node to this node
Descendants node
parent node
The node closest to the node on the only path from the root node to the node
child node
Brother node
Nodes with the same parent node
Level, height, depth, tree height
Level: The root is the first level, its children are the second level, and so on.
Depth of node: starting from the root node and accumulating from top to bottom
The height of the node: the leaf nodes start to accumulate from the bottom up.
Tree height (depth): the maximum number of levels of nodes in the tree
tree nature
1. The number of nodes in the tree is equal to the degree of all nodes plus 1.
Proof: It is not difficult to imagine that, except for the root node, each node has and has only one predecessor node pointing to it. That is to say, each node has a one-to-one correspondence with the branch pointing to it. Suppose there are b branches in the tree, then in addition to the root node, the entire tree contains b nodes, so the number of nodes in the entire tree is the b nodes plus the root node, set to n, then n= b 1. The branch number b is also the degree of all nodes. The proof is completed.
2. There are at most m^(i−1) nodes on the i-th level of a tree with degree m (i≥1).
Proof: (mathematical induction) First consider the case of i=1: the first level only has the root node, that is, one node, and i=1 is brought into the equation to satisfy it. Assuming that the i-1th layer satisfies this property, the i-1th layer has at most m i-2 nodes. …………………. i-1 layer ……… And because the degree of the tree is m, for each node on the i-1th level, at most There are m child nodes. Therefore, the number of nodes in layer i is at most m in layer i-1. times, so there are at most m^(i-1) nodes on the i-th layer.
3. An m-ary tree with height h has at most (m^h-1)/(m-1) nodes.
4. The minimum height of an m-ary tree with n nodes is logm(n(m-1) 1)
tree storage structure
sequential storage structure
Parent representation: Use a set of continuous storage spaces to store the nodes of the tree, and at the same time, in each node, use a variable to store the position of the node's parent node in the array.
chain storage structure
Child representation: Arrange the child nodes of each node and store them into a singly linked list. So there are n linked lists for n nodes; If it is a leaf node, then the singly linked list of children of this node is empty; Then the head pointers of n singly linked lists are stored in a sequence table (array).
Child brother representation: As the name implies, it is to store the child and the brothers of the child node. Specifically, it is to set two pointers to point to the node respectively. The first child node of the point and the right sibling node of this child node.
Binary tree
definition
A binary tree is a finite set of n (n≥0) nodes: ① Or it is an empty binary tree, that is, n=0. ② Or it consists of a root node and two disjoint left subtrees called roots. and the right subtree. The left subtree and the right subtree are each a binary tree.
1. Each node has at most two subtrees.
2. The left and right subtrees are in order
Five basic forms of binary trees:
1. Empty tree
2. There is only one root node
3. The root node has only the left subtree
4. The root node has only the right subtree
5. The root node has both left and right subtrees
special binary tree
1. leaning tree
2. Full binary tree:
3. Complete binary tree
Properties of binary trees
1. The number of leaf nodes in a non-empty binary tree is equal to the number of nodes with degree 2 plus 1.
2. There are at most 2^k−1 nodes on the Kth level of a non-empty binary tree (K≥1)
3. A binary tree with height H has at most 2^H-1 nodes (H≥1)
4. The height of a complete binary tree with N (N>0) nodes is [log2(N 1)] or [log2N] 1.
Binary tree storage structure
sequential storage
The sequential storage structure of a binary tree uses a set of storage units with consecutive addresses to store the node elements of a complete binary tree from top to bottom and from left to right.
chain storage
Each node of a binary tree has at most two children, so when designing the node structure of a binary tree, consider two pointers pointing to the two children of the node.
Binary tree traversal
Preorder traversal: 1) Access the root node; 2) Traverse the left subtree in order; 3) Traverse the right subtree in order.
recursion
non-recursive
In-order traversal: 1) In-order traversal of the left subtree; 2) Access the root node; 3) In-order traverse the right subtree.
recursion
non-recursive
Postorder traversal: 1) Post-order traversal of the left subtree; 2) Post-order traversal of the right subtree; 3) Visit the root node.
recursion
non-recursive
Level traversal: If the tree is empty, do nothing and return directly. Otherwise, access starts from the first level of the tree, traverses level by level from top to bottom, and in the same level, accesses the nodes one by one in order from left to right.
clue binary tree
A binary linked list of N nodes. Each node has links pointing to its left and right children. Node pointers, so there are 2N pointers in total, and the binary of N nodes The tree has a total of N-1 branches, which means there are 2N-(N-1)=N 1 null pointers. For example, if there are 6 nodes in the binary tree on the left, then there are 7 empty nodes. pointer.
Can a large number of free pointers be utilized?
The pointers pointing to the predecessor and successor are called clues. The binary linked list plus clues is called the clue linked list, and the corresponding binary tree is called the clue binary tree.
The process of traversing a binary tree in a certain order to turn it into a threaded binary tree is called threading
Huffman trees and Huffman coding
The algorithm is described as follows: 1) Treat these N nodes as N binary trees containing only one node to form a forest F. 2) Construct a new node, and select the two trees with the smallest root node weights from F as the left and right subtrees of the new node, and change the weights of the new node Set as the sum of the weights of the root nodes on the left and right subtrees. 3) Delete the two trees just selected from F and add the newly obtained trees to F. 4) Repeat steps 2) and 3) until there is only one tree left in F.