MindMap Gallery data structure
This is a mind map about data structures, including linear tables, stacks and queues, strings, trees and binary trees, search, sorting, etc. It is very practical and worth collecting.
Edited at 2021-09-25 14:52:15One Hundred Years of Solitude is the masterpiece of Gabriel Garcia Marquez. Reading this book begins with making sense of the characters' relationships, which are centered on the Buendía family and tells the story of the family's prosperity and decline, internal relationships and political struggles, self-mixing and rebirth over the course of a hundred years.
One Hundred Years of Solitude is the masterpiece of Gabriel Garcia Marquez. Reading this book begins with making sense of the characters' relationships, which are centered on the Buendía family and tells the story of the family's prosperity and decline, internal relationships and political struggles, self-mixing and rebirth over the course of a hundred years.
Project management is the process of applying specialized knowledge, skills, tools, and methods to project activities so that the project can achieve or exceed the set needs and expectations within the constraints of limited resources. This diagram provides a comprehensive overview of the 8 components of the project management process and can be used as a generic template for direct application.
One Hundred Years of Solitude is the masterpiece of Gabriel Garcia Marquez. Reading this book begins with making sense of the characters' relationships, which are centered on the Buendía family and tells the story of the family's prosperity and decline, internal relationships and political struggles, self-mixing and rebirth over the course of a hundred years.
One Hundred Years of Solitude is the masterpiece of Gabriel Garcia Marquez. Reading this book begins with making sense of the characters' relationships, which are centered on the Buendía family and tells the story of the family's prosperity and decline, internal relationships and political struggles, self-mixing and rebirth over the course of a hundred years.
Project management is the process of applying specialized knowledge, skills, tools, and methods to project activities so that the project can achieve or exceed the set needs and expectations within the constraints of limited resources. This diagram provides a comprehensive overview of the 8 components of the project management process and can be used as a generic template for direct application.
Data Structures 8.26
1 Introduction
1.1 Basic concepts of data structure
1.1.1Basic concepts and terminology
1.Data
It is a carrier of information, a collection of symbols that can be input into a computer and recognized and processed by a computer program.
2. Data elements
It is the basic unit of data. A data element can be composed of multiple data items.
Data item: The smallest indivisible unit that constitutes a data element
3. Data objects
A collection of data elements with the same nature, which is a subset of data (for example, the student information object is a subset of the overall student information and course information)
4.Data type
It is the general name of a set of values and a set of operations defined on this set (such as an int type variable, whose value set is an integer in a certain interval (the size of the interval varies depending on the machine), and the operations on it are defined as addition, subtraction, multiplication, division, and taking Modulo and other operations)
1.Atomic type
A data type whose value cannot be subdivided
2. Structure type
A data type whose value can be divided into several (components)
3. Abstract data structure ADT
Refers to a mathematical model and a set of operations defined on the model (only the logical characteristics are emphasized)
Illustration 1-4
1.1.2 Data structure and three elements
data structure
Is a collection of data elements that have a specific relationship with each other
Three elements
1. Logical structure of data
linear structure
General linear table
restricted linear table
stack,queue
string
Linear table promotion
array
nonlinear structure
gather
tree structure
graph structure
2. Data storage structure (image, physical structure)
sequential storage
Advantages: 1. Random access can be achieved 2. Each element occupies the least storage space; Disadvantages: 1. Only a whole adjacent block of storage units can be used, so it is easy to generate more external fragments
chain storage
Advantages: 1. It does not require logically adjacent elements to be physically adjacent, so fragmentation will not occur; Disadvantages: 1. Pointers consume extra space 2. Random access cannot be achieved, only sequential access.
Index storage
While storing element information, an additional index table is created, and each item in the index table is called an index item (key, address); Advantages: 1. Fast retrieval; Disadvantages: 1. The additional index table takes up additional storage space 2. When adding or deleting data You also need to modify the index table, which takes a lot of time.
Hash storage
Directly calculate the storage address of the element (Hash storage) based on the element keyword. Advantages: 1. The operation of adding, deleting and checking nodes is very fast.; Disadvantages: 1. If the hash function is not good, conflict of element storage units will occur, and resolving the conflict will increase time and space overhead.
3. Data operations
Definition of operations
In view of the logical structure, point out the operation function
Implementation of operations
Point out the specific operation steps for the storage structure
1.2 Algorithms and algorithm evaluation
It is a description of solving a specific problem and is a finite sequence of instructions.
time complexity
When the problem size is n, the running time is proportional to O(1<log2n<n<nlog2n<n^2<n3<2^n<n!<n^n)
space complexity
The problem size is n, extra space besides input and program; the algorithm works in place: the auxiliary space required by the algorithm is constant, that is, O(1)
2Linear table
2.1 Definition of linear table
A finite sequence of n data elements with the same data type
2.2 Sequential representation of linear tables
Sequence table
insert
Bool ListInsert(SqList &L,int i,ElemType e){ If(i < 1 || i > L.length 1) return false; If(L.length >= MaxSize) return false; For(int j = L.length;j>=i;j- -) //L.length is the position after the last element in the array subscript L.data[j] = L.data[j-1]; L.data[i-1] = e; L.length; return true; }
Time Complexity Best Worst Average O(1) O(n) O(n)
delete
Bool ListDelete(SqList &L,int i,ElemType &e){ If( i<1 || i>L.length) return false; e = L.data[i-1]; For(int j = i;j<L.length;j) L.data[j-1] = L.data[j]; L.length - -; Return true; }
O(1) O(n) O(n)
Search by value (search sequentially)
Int LocateElem(SqList L,ElemType e){ For(int i=0;i<L.length;i) If(L.data[i] == e) return i 1; Return 0; }
O(1) O(n) O(n)
2.3 Chained representation of linear tables
Single list typedef struct LNode{ ElemType data; struct LNode *next; }LNode,*LinkList;
Create singly linked list
Head insertion method
LinkLIst List_HeadInsert(LinkList &L){ LNode *s;int x; L = (LNode*)malloc(sizeof(LNode)); L->next = NULL; scanf(“%d”,&x); while(x != 9999){ s = (LNode*)malloc(sizeof(LNode)); s->data = x;s->next = L->next; L->next = s; s->data = x; scanf(“%d”,&x); } return L;}
tail insertion method
LinkList List_TailInsert(LinkList &L){ int x; L = (LNode*)malloc(sizeof(LNode)); LNode *s,*r = L; scanf(“%d”,&x); while(x != 9999){ s = (LNode*)malloc(sizeof(LNode)); s->data = x; r->next = s; s = r; scanf(“%d”,&x); } r-next = NULL; return L: }
Find nodes by serial number
LNode *GetElem(LinkList L,int i){ int j = 1; LNode *p = L->next; if(i == 0) return L; if(i < 1) return NULL; while(p && j<i){ p = p ->next; j; } return p; }
Find node table node by value
LNode *LocateElem(LinkList L,ElemType e){ LNode *p = L->next; while(p != NULL && p->data != e) p = p->next; return p; }
Insert node
p = GetElem(L,i-1); s->next = p->next; p->next = s; (i-1 is p, inserted after p)
Time O(n)
s->next = p->next; p->next = s; temp = p->data; p->data = s->data; s->data = temp; (i is p, insert and exchange data after p, and realize p forward insertion)
Time O(1)
Delete node
p = GetElem(L,i-1); q = p->next; p->next = p->next->next; free(q);
Time O(n)
q = p->next; p->data = q->data; p->next = p->next->next; free(q);
Time O(1)
Ask for table length
The table length does not include the head node
O(n)
Double linked list typedef struct DNode{ ElemType data; struct DNode *prior,*next; }DNode,*DLinkList;
Insert node
s->next = p->next; s->next->prior = s; s->prior = p; p->next = s;
O(1)
Delete node
p->prior->next = p->next; p->next->prior = p->prior; free(p);
O(1)
circular linked list Obvious features: The last node pointer is not NULL. The null condition is whether the head node is equal to the head pointer, not NULL; that is, if (L->next == L)
Circular singly linked list
Circular doubly linked list
static linked list Use arrays to describe the linked storage structure of a linear list. Features: The pointer is the relative address of the node (array subscript), also known as the cursor. Like the sequence list, the static linked list also needs to pre-allocate a continuous memory space. #define MaxSize 50 typedef struct{ ElemType data; int next; }SLinkList[MaxSize];
3 stacks and queues
stack Cattelan number: 1/n 1(C2n n)
Sequential storage of the stack (sequential stack) #define MaxSize 50 typedef struct{ ElemType data[MaxSize]; int top; //Stack top pointer }SqStack;
initialization void InitStack(SqStack &S){ top = -1; }
The stack is empty bool StackEmpty(SqStack S){ if(S.top == -1) return true; else return false; }
push into stack bool Push(SqStack &S,ElemType x){ if(S.top == MaxSize-1) return false; S.data[ S.top] = x; //top is initialized to -1 return true; }
pop bool Pop(SqStack &S,ElemType &x){ if(S.top == -1) return false; x = S.data[S.top- -]; return true; }
Read the top element of the stack bool GetTop(SqStack S,ElemType &x){ if(S.top == -1) return false; x = S.data[S.top]; return true; }
Shared stack: Taking advantage of the relatively unchanged position of the bottom of the stack, let two sequential stacks share a one-dimensional array space. The bottoms of the two stacks are set at both ends of the shared space, and the tops of the two stacks extend toward the middle of the shared space; The shared stack can make more efficient use of storage space. The two stack spaces adjust to each other, and overflow only occurs when the entire storage space is full.
Initial: top0 = -1 top1 = MaxSize; stack full: only when two stack pointers are adjacent top1 - top0 = 1
chain stack typedef struct Linknode{ ElemType data; struct Linknode *next; }*LinkStack;
queue
Queue sequential storage #define MaxSize 50 typedef struct{ ElemType data[MaxSize]; int front,rear; }SqQueue;
sequential queue
Team empty condition: Q.front == Q.rear ==0; note that Q.rear == MaxSize cannot be used as a condition to judge the team is full, it may be false and overflow
circular queue The dequeue, join, and full operations of the circular queue must have %; the empty condition is only Q.rear == Q.front
initialization void InitQueue(SqQueue &Q){ Q.rear = Q.front = 0; }
Team is empty bool isEmpty(SqQueue Q){ if(Q.rear == Q.front) return true; else return false; }
Join the team bool EnQueue(SqQueue &Q,ElemType x){ if((Q.rear 1)%MaxSize == Q.front) return false; Q.data[Q.rear] = x; Q.rear = (Q.rear 1)%MaxSize; return true; }
Dequeue bool DeQueue(SqQueue &Q,ElemType &x){ if(Q.rear == Q.front) return false; x = Q.data[Q.front]; Q.front = (Q.front 1)%MaxSize; return true; }
Chain storage of queues (chain queue) typedef struct LNode{ ElemType data; struct LNode* next; }LNode; typedef struct{ LNode *front,*rear; }LinkQueue; The chain queue is suitable for situations where data elements change relatively large, and there is no situation where the queue is full and causes overflow.
initialization void InitQueue(LinkQueue &Q){ Q.front = Q.rear = (LinkQueue*)malloc(sizeof(LinkQueue)); //Create the head node Q.front ->next = NULL; }
Team is empty bool IsEmpty(LinkQueue&Q){ if(Q.front == Q.rear) return true;//With head node else return false; }
Join the team void EnQueue(LinkQueue &Q,ElemType x){ LinkNode *s = (LinkNode*)malloc(sizeof(LinkNode)); s->data = x; s->next = NULL;//Create a new node and insert it at the end of the chain Q.rear->next = s; Q.rear = s; }
Dequeue bool DeQueue(LinkQueue &Q,ElemType &x){ if(Q.rear == Q.front) return false; LinkNode *p = Q.front->next; x = p->data; Q.front->next = p->next; if(Q.rear == p) Q.rear = Q.front;//The original queue has only one node free(p); return true; }
Double-ended queue: A queue that allows both ends to perform enqueue and dequeue operations. Its element logical structure is still a linear structure; the two ends of the queue are called the front end and the back end respectively.
Input restricted deque
Allow insertions and deletions at one end, but only deletions at the other end
Output restricted deque
Allow insertions and deletions at one end, but only insertions at the other end
Applications of stacks and queues
Application of stack in bracket matching
[]()([][]()) is the correct format; (] [) is the wrong format
Set an empty stack and read parentheses in sequence; if it is a right parenthesis, pop the top element of the stack for inspection. If it matches, pop it out of the stack, otherwise an error will be reported; if it is a left parenthesis, push it into the stack; when the algorithm ends, check whether the stack is empty. If it is empty, it will match. If it is not empty, the bracket sequence will not match.
Application of stack in expression evaluation
Infix expressions: depend on operator precedence, but also deal with parentheses Postfix expressions: operator precedence already taken into account, no parentheses (post-order traversal of the expression tree) Convert infix expression to postfix expression
Processing postfix expressions: Scan each item of the expression sequentially, and if it is an operand, push it onto the stack; if it is an operator, continuously exit the two operands Y and X from the stack to form X<OP>Y, and put Calculation results are pushed back onto the stack
Application of stack in recursion
Application of queue in hierarchical traversal
The root node enters the queue; if the queue is empty (all nodes have been processed), the traversal ends, otherwise the next step is repeated; the first root node in the queue is dequeued and accessed. If there is a left child, the The left child joins the team; if it has a right child, add the right child to the team and return to the previous step
Application of queues in computer systems
1. Solve the problem of speed mismatch between the host and external devices; 2. Solve resource competition problems caused by multiple users
1. Host and printer: Set up a print data buffer. When the buffer is full, the host will pause the output. The printer will take fcfs out of the queue in turn, and then send a request to the host after printing.
2.CPU resource allocation: FCFS
Compressed storage of special matrices Problem solving: According to the corresponding conditions, draw a sketch and make predictions. The main thing is to understand the storage logic rather than memorizing formulas.
array A finite sequence of n data elements of the same type Dimension boundary: the value range of the subscript Arrays are a generalization of linear tables. A one-bit array can be viewed as a linear table, and a two-dimensional array can be viewed as a linear table whose elements are also fixed-length linear tables, and so on; once an array is defined, its dimensions and bounds are It no longer changes, so in addition to initialization and destruction, the array will only have operations of accessing elements and modifying elements; the two-dimensional array is physically stored as mapped to a one-bit array!
Array storage structure row-major; column-major
Compressed storage of matrices Compressed storage: Only one storage space is allocated for multiple elements with the same value, and no storage space is allocated for zero elements. Special matrix: has many identical matrix elements or zero elements, with regular distribution
Symmetric matrix
A[1 n][1 n] is stored in B[n(n 1)/2] Develop the formula on one side and interchange i and j on the other side
triangular matrix
A[1 n][1 n] is stored in B[n(n 1)/2 1] After deducing the formula on one side, the other side only stores a number in n(n 1)/2. Analyzing the specific situation, this is because the subscript starts from zero.
upper triangular matrix
lower triangular matrix
tridiagonal matrix
k=2i j-3 When the subscript k of a one-bit array B, i=(k 1)/3 1 is bounded, j=k-2i 3 ! ! ! Don’t memorize by rote! ! ! To understand, sketch the corresponding rows of a tridiagonal matrix! ! !
sparse matrix The number t of non-zero elements in the matrix is very small relative to the number s of matrix elements, that is, the matrix with s>>t is called a sparse matrix. For example, a matrix 100x100 has less than 100 non-zero elements.
Use triplet (row label, column label, value) or cross linked list method
4 skewers
Definition and implementation of string
Definition of string: String is referred to as string. The objects of non-numeric processing on computers are basically string data. String (String) S; a subsequence composed of any multiple consecutive character strings in a string is called a substring of the string, and the corresponding string containing the substring is called the main string. The logical structure of a string is very similar to that of a linear table. The only difference is that its data object is limited to a character set; the operating object of a linear table is a single element, while the operating object of a string is a substring!
string storage structure
Fixed-length sequential storage #define MaxLen 255 typedef struct{ char ch[MaxLen]; int length; }SString; You can also not record the length and use ‘\0’ which is not included in the string length to imply the string length.
Heap allocated storage representation (dynamic allocation) typedef struct{ char*ch; int length; }HString; In C language, dynamically allocated space is stored in the heap and managed by malloc() free(). If the allocation fails, NULL is returned.
Block chain storage representation: each node is called a block, and a block can store one character or multiple characters; the entire linked list is called a block chain structure. When the last node occupies less than one block, it is usually filled with "#"
Basic string operations Minimum set of operations: string assignment, string comparison, finding string length, string concatenation, and finding substrings; That is, these operations cannot be implemented using other string operations.
String assignment StrAssign(&T,chars) assigns the value of T to chars
String comparison StrCompare(S,T) if S>T,return>0;S=T,return 0;S<T,return<0
Find the string length StrLength(S) returns the number of S elements
Series concatenation Concat(&T,S1,S2) uses T to return a new string formed by concatenating S1 and S2.
Find the substring SubString(&Sub,S,pos,len) Use Sub to return the substring starting from the pos character of the S string and continuing with the length of len
string pattern matching The positioning operation of substrings is usually called pattern matching of strings. What is found is the position of the substring (often called the pattern string) in the main string.
Simple pattern matching algorithm
int index(SString S,SString T){ int i=1,j=1; while(i<=S.length && j<=T.length){ if(S.ch[i] == T.ch[j]){ i ;j ; } else{ i = i-j 1 1;j = 1; } if(j > T.length) return i - T.length; else return 0; } Worst time complexity O(mn)
Improved pattern matching algorithm—KMP algorithm During the matching process, the already obtained matching information is used so that the main string pointer i no longer backtracks, but continues to match backwards from the current position. The calculation of the number of digits to slide the mode backward is only related to the structure of the mode itself! Has nothing to do with the main string Although the normal mode is O(m n), the recognized time complexity of the KMP algorithm is O(n m)
Related concepts: Prefix: All header substrings of the string except the last character; Suffix: All trailing substrings of the string except the first character; Partial Match (PM): The longest equal prefix and suffix length of the string. Example: 'ababa' Attention! ! Starting from the first character of the main string, split all substrings, and then analyze the length of the prefix and suffix and the longest equal suffix and suffix respectively. ‘a’ prefix and suffix is an empty set, and the longest equal prefix and suffix length is 0 ‘ab’ prefix ‘a’ suffix ‘b’ The longest equal match length is 0 ‘aba’ prefix ‘a’ ‘ab’ suffix ‘a’ ‘ba’ {‘a’ ‘ab’} ^ {‘a’ ‘ba’} = ‘a’, the longest equal suffix length is 1 ‘abab’ prefix ‘a’ ‘ab’ ‘aba’ suffix ‘b’ ‘ab’ ‘bab’, the longest equal prefix and suffix length is 2 'ababa' prefix 'a' 'ab' 'aba' 'abab' suffix 'a' 'ba' 'aba' 'baba' ,{ } ^ { } = {'a' 'aba'}, the longest equal suffix length 3 So the partial matching value of the final string is 00123
When using the KMP algorithm, first write the PM table of the pattern string (substring); when the substring does not match the main string, find the PM value of the last matching element of the substring in the PM table, and then let the substring The distance moved backward (number of matched characters - PM value of the last matched character); when a mismatch occurs in a certain trip, if the partial matching value of the corresponding part is 0, it means that there are no equal suffixes and suffixes in the matched sequence. The number of digits moved is the largest (matched length - 0), and it is moved directly to the position i pointed to by the main string at this time for the next comparison. Move = (j-1) - PM[j-1];
Generation and modification of next array: In order to facilitate the mismatch, instead of looking forward to a character, directly check the PM value of the current mismatched character position, so all elements in the PM are moved to the right by 1 position, the first position is filled with -1, and the last position is ignored; Called next array, Move = (j-1)-next[j]; At this time, j returns to j = j-Move = next[j] 1; Then add 1 to all next and update the next array so that j = next[j]; (At this time, the first position of the next array is 0) The meaning of the final next array: When the j-th character of the substring mismatches, jump to the next[j] position of the substring for the next round of matching! ! !
Code using the final next array: int index_KMP(String T,String S,int next[]){ int i=1,j=1; while(i<=S.length && j<=T.length){ if(S.ch[i] == T.ch[j]){i ;j ;} else j = next[j]; } if(j > T.length) return i-T.length; else return 0; }
Further optimization of KMP algorithm Example: When the main string 'aaabaaaab' and the pattern string 'aaaab' are matched, there is a mismatch at the fourth position. If the next[j] array above is used, it will be moved one to the left each time and then compared; but Pj = Pnext[ will appear. j], resulting in redundant comparison; Solution: Make sure that Pj = Pnext[j] should not appear; if it does, perform the recursion again, correct next[j] to next[next[j]] and name it nextval[j]
5 trees and binary trees
trees and forest
concept
A tree is a finite set of n (n>0) nodes. When n=0, it is called an empty tree; it has only one root node, and the remaining nodes can be divided into disjoint finite sets, called subtrees; 1. The root node of the tree has no predecessor, and the nodes other than the root node have one and only one predecessor. 2. All nodes in the tree can have zero or more successors; the tree is suitable for representing data with a hierarchical structure; 3. Terminology: 1. Ancestor-descendant (can span multiple levels, as long as there is a top-down relationship); child-parent-brother (with the same parent node); 2. The number of children of a node in the tree is called the degree of the node; the maximum degree of a node in the tree is the degree of the tree; 3. Nodes with degree > 0 are called branch nodes (non-terminal nodes); nodes with degree 0 are called leaf nodes (terminal nodes); the number of branches of a branch node is the degree of the branch node. ; 4. The level of nodes is defined starting from the root of the tree, with the root node being the first level; nodes whose parent nodes are on the same level are cousins of each other; The depth of a node is accumulated layer by layer starting from the root node and from top to bottom (imagine the root of the tree becoming deeper downwards) The height of the node is accumulated layer by layer starting from the leaf node and from the bottom up (it looks very high from the bottom up) The height or depth of a tree is the maximum number of levels of nodes in the tree; 4. Ordered trees and unordered trees, ordered trees are not interchangeable 5. Path and path length: The path is composed of the sequence of nodes passed between the two nodes, and the path length is the number of edges passed! 6. Forest: a collection of m disjoint trees. As long as the root node is deleted, the tree becomes a forest; conversely, as long as a root node is added to m trees, the forest becomes a tree.
nature
nature: 1. The number of nodes of the tree is equal to the sum of the degrees of all nodes (total number of branches) 1 (root node) 2. The i-th level of a tree of degree m has at most m^i-1 nodes (i>1) 3. An m-ary tree with height h has at most (m^i - 1)/(m-1) nodes. 4. The minimum height of an m-ary tree with n nodes is
storage structure
1. Sequential storage: Parental notation: a large structure sets a small structure, and the small structure is the data and parent pointers; the large structure is an array composed of multiple small structures and the number of nodes. #define Max_Tree_SIze 100 typedef struct{ ElemType data; int parent; }PTNode; typedef struct{ PTNode nodes[Max_Tree_Size]; int n; }PTree; This structure takes advantage of the fact that each node (except the root node) has only one parent, and can quickly obtain the parent node of each node, but it requires traversing the entire tree to find the children of the node.
2. In the child representation: connect the child nodes of each node with a single linked list to form a linear structure. There are n linked lists for n nodes (the child linked list of a leaf node is an empty linked list); This method is very straightforward to find children, but finding parents requires traversing the n child linked lists pointed to by the child linked list pointers in n nodes; Similar to critical table
3. Child brother representation: also known as binary tree representation. typedef struct CSNode{ ElemType data; struct CSNode *firstchild,*nextsibling; }CSNode,*CSTree; Conveniently implement the operation of converting a tree into a binary tree The left child has a brother
Operations: Conversion of trees, forests and binary trees
Conversion between trees and binary trees; both trees and binary trees can be represented by binary linked lists. The physical structure is the same, but the interpretation is different; the rules for converting a tree into a binary tree: left child and right brother. Since the root node has no brothers, the root node has no right subtree; handwriting conversion steps: 1. Connect a line between sibling nodes 2. Only retain the connection between each node and its first child, and use other Erase all the children's connections 3. Take the root of the tree as the axis and rotate 45 degrees clockwise
Forest to Binary Tree Conversion 1. First convert each tree in the forest into a binary tree 2. The root of each tree is regarded as a sibling relationship, and the subsequent trees are sequentially regarded as the right node of the previous tree. 3. Rotate the root of the first tree 45 degrees clockwise
Convert a binary tree to a forest: 1. Disconnect the right subtree of the root node in turn until there is only one root node without a right subtree. 2. Convert all disconnected binary trees into trees to obtain the original forest Converting a binary tree to a tree or forest is unique
Convert binary tree to tree 1. Remove all right nodes, right nodes, and right nodes of the left subtree of the root node. . . all as children of root 2. Adjust in sequence
Traverse
tree traversal
Root traversal first Equivalent to pre-order traversal of the corresponding binary tree
back root traversal Equivalent to in-order traversal of the corresponding binary tree! ! ! ! !
Level traversal
Traveling through the forest
Preorder traversal of the forest From front to back, each tree is traversed root first.
Traversing the forest in sequence Traverse the root of each tree from front to back (the forest says that in-order traversal is relative to the corresponding binary tree)
Application of trees - union search
Union-find is a simple set representation that supports 3 operations The structure definition of union search set: #define SIZE 100 int UFSets[SIZE]; A set in the union search is a tree, and multiple sets form a forest.
Initialization operation (S is a union search set), initialize each element in the set S to a sub-set with only one single element, S is the whole set, which is a forest, stored in the parent representation array (both are -1 at this time ) void Initial(int S[]){ for(int i=0;i<size;i) S[i] = -1; }
Find operation, finds the subcollection where the single element x is located in the set S, and returns the name of the subcollection. int Find(int S[],int x){ while(S[x]>=0)//Loop to find the root of x x = S[x];//Assign x to the root of x return x; }
Union operation merges the sub-set root2 in the S set into root1. It requires that root1 and root2 do not intersect with each other, otherwise they will not be merged. void Union(int S[],int Root1,int Root2){ //Require Root1 and Root2 to be different S[Root2] = Root1; //Connect Root2 to another Root1 } The value of the root node of each tree in the forest in the parent representation array is negative, and the absolute value is the total number of nodes under the root.
Binary tree
Concept: Each node has at most two subtrees, and the order cannot be reversed. The order of nodes in a binary tree is not relative to another node, it is determined. (different from a tree with degree 2); Several special binary trees: 1. Full binary tree: each level contains the most nodes 2. Complete binary tree: 1-n sequence numbers completely correspond to the nodes in the full binary tree 3. The keys of all nodes on the left subtree of the binary sorting tree are smaller than the keys of the root node, and the keys on the right are greater than the root. 4. Balanced binary tree: The depth difference between the left subtree and the right subtree of any node in the tree does not exceed 1 5. Clue binary tree
Clue binary tree (unfamiliar, so specially expanded) typedef struct TreadNode{ ElemType data; struct TreadNode *lchild,*rchild; int ltag,rtag; }ThreadNode,*ThreadTree; A binary linked list formed with a node structure as the storage structure of the binary tree is called a clue linked list; the pointers pointing to the predecessor and successor of the node are called clues. A binary tree with clues is called a clue binary tree
Characteristics and Definition: Use the null pointer of the binary tree to store the predecessor or successor pointer, so that it can be traversed as easily as traversing a linked list. Clue binary tree speeds up finding the predecessor and successor of a node! Regulation: If there is no left subtree, let lchild point to its predecessor node; if there is no right subtree, let rchild point to its successor node; (lchild,ltag,data,rtag,rchild) ltag = 0,lchild refers to the left child, ltag = 1,lchild refers to the node predecessor rtag = 0, rchild refers to the right child, rtag = 1, rchild refers to the successor of the node
Construction of in-order clue binary tree The essence of clueing a binary tree is to traverse the binary tree once void InTread(ThreadTree &p,ThreadTree &pre){ if(p!=NULL){ InThread(p->lchild,pre); if(p->lchild == NULL){ p->lchild = pre; p->ltag = 1; } if(pre!=NULL && pre->next->rchild == NULL){ pre->rchild = p; pre->rtag = 1; } pre = p; InThread(p->rchild,pre); } } void CreateInThread(ThreadTree T){ ThreadTree pre = NULL; if(T!=NULL){ InThread(T,pre); pre->rchild = NULL; pre->rtag = 1; } } You can also add a head node to the clue list of the binary tree, let lchild refer to the root node of the binary tree, and rchild refers to the last visited node of the binary tree; In addition, the lchild of the first visited node and the rchild of the last visited node in the binary tree both point to the head node. It becomes a two-way clue linked list, which facilitates traversal of the clue binary tree from front to back and from back to front.
Traversal of binary trees with inorder clues The nodes of the in-order clue binary tree hide the predecessor and successor information of the clue binary tree. 1. Find the first node under the in-order sequence in the in-order clue binary tree: ThreadNode *Firstnode(ThreadNode *p){ while(p->ltag == 0) p = p->lchild; return p; }//Find the lower left node (not necessarily a leaf node) 2. Find the successor of node p in the in-order sequence in the in-order clue binary tree ThreadNode *Nextnode(ThreadNode *p){ if(p->rtag == 0) return Firstnode(p->rchild); else return p->rchild; } 3. Main function void Inorder(ThreadNode *T){ for(ThreadNode *p = Firstnode(T);p!=NULL;p-Nextnode(p)) visit(p); }
Pre-order clue binary tree and post-order clue binary tree You only need to change the code segment of the threaded transformation and the position where the threaded left and right subtree recursive functions are called. The direction pointing to NULL may be different, a predecessor refers to NULL, and a successor refers to NULL; The specific predecessor and successor need to be analyzed in detail according to the traversal order; Special: To find the successor on the post-order clue binary tree, you need to know the node parents, that is, you need to use a trident linked list with a flag field as the storage structure (the trident linked list has three fields, and a pointer to the parent node is added)
nature: 1. The number of leaf nodes of a non-empty binary tree is equal to the number of nodes with degree 2 1, that is, n0 = n2 1 2. The kth level of a non-empty binary tree has at most 2^(k-1) nodes. 3. A non-empty binary tree with height h has at most 2^h-1 nodes. 4. Complete binary tree: i>1 The parent node is bounded by i/2. i is an even number, i/2; i is an odd number, (i-1)/2 2i<=n i’s left child is 2i, otherwise there is no left child 2i 1<=n i’s right child is 2i 1, otherwise there is no right child 5. Two formulas: 2^(h-1)-1<n<=2^h-1;2^(h-1) <=n <2^h
Storage structure: 1. Sequential storage structure: Full binary trees and complete binary trees are suitable for sequential storage structures; general binary trees need to add many empty nodes (0 in the array); it is recommended to start storage from the array subscript 1, which is in line with the properties of a complete binary tree i 2. Chain storage structure: Space utilization is higher than sequential storage structure (general binary tree) typedef struct BiTNode{ ElemType data; struct BiTNode *lchild,*rchild; }BiTNode,*BiTree;
operate
Traverse
DFS Each node is visited once and only once. The time complexity and space complexity are both O(n). The depth of the recursive work stack is the depth of the tree.
preorder traversal void PreOrder(BiTree T){ if(T!=NULL){ visit(T); PreOrder(T->lchild); PreOreder(T->rchild); } } Non-recursive: void PreOrder2(BiTree T){ InitStack(S);BiTree p = T; while(p || !isEmpty(S)){ if(p){ visit(p);Push(S,p); p = p->lchild; } else{ Pop(S,p); p = p->rchild; } } }
inorder traversal void InOrder(BiTree T){ if(T!=NULL){ InOrder(T->lchild); visit(T); InOrder(T->rchild); } } Non-recursive: void InOrder2(BiTree T){ InitStack(S);BiTree p = T; while(p || !isEmpty(S)){//p is not empty or the stack is not empty loop if(p){ //All the way to the left Push(S,p); p = p->lchild; } else{ Pop(S,p);visit(p); p = p->rchild; //p goes to the right subtree } } }
Postorder traversal void PostOrder(BiTree T){ if(T!=NULL){ PostOrder(T->lchild); PostOrder(T->rchild); visit(T); } }
BFS
Level traversal void LevelOrder(BiTree T){ InitQueue(Q); BiTree p = T; Enqueue(Q,p); while(!isEmpty(Q)){ Dequeue(Q,p); visit(p); if(p->lchild != NULL) Enqueue(Q,p->lchild); if(p->rchild != NULL) Enqueue(Q,p->rchild); } }
application
Binary sort tree BST (binary search tree)
BST lookup BSTNode *BST_Search(BiTree T,ElemType key){ while(T!=NULL&&key!=T->data){ if(key<T.data) T = T->lchild; else T=T->rchild; } return T; }
BST insertion: The inserted node must be a newly added leaf node int BST_Insert(BiTree &T,KeyType k){ if(T==NULL){ T = (BiTree)malloc(sizeof(BSTNode)); T->key = k; T->lchild = T->rchild = NULL; return 1; } else if(k==T->key) return 0; else if(k<T->key) return BST_Insert(T->lchild,k); else return BST_Insert(T->rchild,k); }
Structure of BST void Create_BST(BiTree &T,KeyType str[],int n){ T = NULL; int i =0; while(i<n){ BST_Insert(T,str[i]); i; } }
Deletion of BST
The deleted node is a leaf node and is deleted directly.
If the deleted node z has only one left subtree or right subtree, let its left subtree or right subtree replace z's
If the deleted node z has both left and right subtrees, then replace z with the direct successor (or direct predecessor) of z, and then delete this direct successor (or direct predecessor) from the BST, thus switching to the first two situations.
BST search efficiency analysis
The search efficiency of BST mainly depends on the height of the tree If the height difference between the left and right subtrees of the BST never exceeds 1, then such a binary sorting tree is called a balanced binary tree, and the average search length is O(log2n); If the BST is a single tree, the average search length is O(n)
The average search length ASL=∑PiCi (i=1,2,3,…,n)Pi is the probability of the i-th data element in the lookup table, and Ci is the number of comparisons that have been made when finding the i-th data element.
Comparison between BST and binary search: The decision tree of binary search is unique, but the search of BST is not unique (the insertion order affects the tree shape); Insertion and deletion: The object of binary search is an ordered sequence list, which requires O(n). BST does not need to move the node, but only needs to modify the pointer to complete, which is O(log2n); Summary: When the ordered table is a static lookup table, it is suitable for sequential table storage and uses binary search; when the ordered table is a dynamic lookup table, a binary sorting tree should be selected as the logical structure.
balanced binary tree
Designed to avoid reducing the performance of binary sort trees Definition: The absolute value of the height difference between the left and right words of any node does not exceed 1, which is referred to as a balanced tree; The height difference between the left subtree and the right subtree of a node is the balance factor of the node. The balance factor of a balanced binary tree node can only be -1 0 1; therefore, a balanced binary tree can be defined as an empty tree.
Insertion into a balanced binary tree: The object of each adjustment is the minimum unbalanced subtree, that is, the subtree with the node whose absolute value of the balance factor closest to the inserted node on the insertion path is greater than 1 as the root! The first half of the insertion process of a balanced binary tree is the same as that of a binary sorted tree, but if an imbalance is caused after insertion, it will be divided into four situations. (Improved BST); Here LL RR LR RL refers to the position of the inserted node relative to the root node of the minimum unbalanced subtree (the position of a certain subtree of a child)
LL balanced rotation (right single rotation) The node is inserted into the lower left (not necessarily immediately adjacent) of the left subtree of A (must be immediately adjacent to it), and the B node at the first L (the left child of the minimum unbalanced subtree root node) is right-rotated to replace A, and A is As the right child of B, B originally had children as the left subtree of A.
RR balanced spin (left single spin) The node is inserted to the lower right of the right child of A. The B node at the first R is rotated left to replace A. A is used as the left subtree of B, and the original left subtree of B is used as the right subtree of A. ps: LL and RR modes both help the left/right child B of the unbalanced subtree node A to "come up"
LR balanced rotation (double rotation first left and then right) A node is inserted into the right subtree of the left child of node A. First, rotate the root node C of the right subtree of A's left child B in the upper left direction to the position of B, and then rotate the C node in the upper right direction to the position of the A node.
RL balanced rotation (right and then left double rotation) A node is inserted into the left subtree of the right child of node A. First, rotate the left subtree root node C of the right child B of node A to the position of B, and then rotate C to the position of node A. ps: When LR and RL are rotated, whether the new node is inserted into the left subtree of C or the right subtree of C does not affect the rotation process. The LR and RL modes are to help the root node C of the right/left subtree of the unbalanced subtree's left/right child B "upper"
Find balanced binary tree Same as binary sorted tree search; The maximum depth of a balanced binary tree containing n nodes is O(log2n), so the maximum search length is also O(log2n); The maximum number of comparisons required to find a balanced binary tree with a given number of nodes.
A balanced binary tree of height h There are at most 2^h - 1 nodes; There are at least h(n) nodes, h(n)=h(n-1) h(n-1) 1,h(0)=0,h(1)=1,h(2)=2c
red black tree
Red black tree concept
Basic operations of red-black trees
red black tree properties
Applications of red-black trees
Red-black tree performance analysis
Huffman trees and Huffman coding
Huffman tree definition A node in the tree is assigned a value that represents a certain meaning, which is called the weight of the node; The product of the path length from the root of the tree to any node (number of edges passed) and the weight of the node is called the weighted path length of the node; The sum of the weighted path lengths of all leaf nodes in the tree is called the weighted path length of the tree, recorded as WPL (Weighted Path Length of Tree); In a binary tree with n weighted leaf nodes, the binary tree with the smallest weighted path length WPL is called a Huffman tree! Also called optimal binary tree
The structure of Huffman tree Given n nodes with weights w1,w2,..,wn respectively, Create a new root node, and select the two nodes with the smallest weight each time to form a binary tree. Add the weights of the two nodes and treat them as the new root node weights. Add these two nodes in the original series. The root node is deleted and a new root node with weight is added; Repeat this process until there are no more independent nodes
Properties of Huffman trees 1. Each initial node eventually becomes a leaf node, and the smaller the weight, the greater the path length from the node to the root node. 2. During the construction process, a total of n-1 new nodes (double-branch nodes) are created, so the total number of nodes of the Huffman tree is 2n-1 3. Each construction selects 2 trees as the children of the new node, so there is no node with degree 1 in the Huffman tree.
Huffman coding
Several related concepts: Fixed-length encoding: equal-length binary representation of each character Variable length encoding: different characters are represented by binary bits of different lengths. Variable-length encoding is much better than fixed-length encoding, because characters with high frequency of occurrence can be assigned short codes, and codes with low frequency can be assigned long codes, thereby shortening the average encoding length of characters and compressing data. Effect, Huffman coding is a widely used and very effective data compression coding None of the codes is the code of the prefix, it is the code of the prefix; decode it: identify it from front to back, and translate it into the original code
The Huffman tree goes down from the root, assigning 0 to one side and 1 to the other side (either 0 or 1 on the left and right, as long as they are always unified); The WPL of the Huffman tree can be regarded as the binary code length obtained by the final encoding; Huffman trees are not unique, WPL must be optimal
6 pictures
Basic concepts of graphs
G=(V,E), vertex set V, edge set E; V(G), represents the finite non-empty set of vertices in graph G; E(G) represents the set of relationships (edges) between vertices of graph G ; |V| represents the number of vertices, |E| represents the number of edges A linear table can be an empty table, and a tree can be an empty tree, but a graph cannot be an empty graph; The vertex set V in the graph must not be empty, and E can be empty.
directed graph, undirected graph Directed graph: E is a directed edge (arc), and an arc is an ordered pair of vertices, denoted as <v,w> Undirected graph: E is an undirected edge (edge), and an edge is an unordered pair of vertices, recorded as (v, w)
Simple graph, multiple graph Simple diagram: 1. There are no duplicate edges (edges pointing to each other between two adjacent nodes in a directed graph are not counted as duplicate edges) 2. There is no edge from the vertex to itself. Multiple plots: 1. The number of edges between two vertices is greater than 1 2. Allow vertices to be related to themselves through an edge The definitions of multiple graphs and simple graphs are relative. Only simple graphs are discussed in the data structure.
Complete graph (simple complete graph) For undirected graphs, the value range of |E| is 0-n(n-1)/2, An undirected graph with n(n-1)/2 edges is called a complete graph, and there is an edge between any two vertices; For directed graphs, the value range of |E| is 0-n(n-1), A directed graph with n(n-1) arcs is called a directed complete graph, and there is an opposite arc between any two vertices.
subplot G=(V,E),G'=(V',E'); If V' is a subset of V and E' is a subset of E, then G' is called a subgraph of G. If V(G')=V(G) is satisfied, then G' is called the generated subgraph of G; Not any subset of V and E can form a G subgraph, because the vertices associated with some edges in the subset of E may not be in this subset of V.
Connectivity, connected graphs and connected components If there is a path from vertex v to vertex w, then v and w are said to be connected. If any two vertices in G are connected, it becomes a connected graph, otherwise it is a disconnected graph. The maximal connected subgraph in an undirected graph is called a connected component. A connected graph has at least n-1 edges. If the number of edges is less than n-1, it must be a non-connected graph; How many edges can a disconnected graph have at most? : The critical state of n-1 vertices forming a complete graph (adding any edge to form a connected graph)
Maximum connected subgraph: a connected subgraph in graph G that is not contained by other connected subgraphs (maximum here is not the maximum of a specific quantity, but a relationship that is not included, which is actually all the vertices in the connected subgraph and edge exist) Minimally connected subgraph: The spanning tree of graph G is a minimally connected subgraph
Strongly connected graph, strongly connected components In a directed graph, if a pair of vertices v to w and w to v have paths, then the two vertices are said to be strongly connected. If any pair of vertices in the graph is strongly connected, the graph is called a strongly connected graph. The maximal strongly connected subgraph in a directed graph is called the strongly connected component of the directed graph; The situation with the fewest edges when a directed graph is strongly connected: at least n edges are needed to form a cycle
spanning tree, spanning forest The spanning tree of a connected graph is a minimal connected subgraph that contains all the vertices in the graph. If the number of fixed vertices is n, then the spanning tree has n-1 edges. If an edge is cut off from the spanning tree, it will become a non-connected subgraph; if an edge is added, a cycle will be formed. In a disconnected graph, the spanning tree of connected components constitutes the spanning forest of the disconnected graph.
Vertex degree, in-degree and out-degree Undirected graph: TD(v), the sum of the degrees of all vertices of an undirected graph is equal to 2 times the number of edges Directed graph: The degree of vertex v is divided into in-degree ID(v) and out-degree OD(v). The degree of vertex v is equal to the sum of in-degree and out-degree: TD(v)=ID(v) OD(v) , the in-degree of all vertices of a directed graph is equal to the out-degree equal to the number of edges
Bian's Quanhe.com Each edge can be marked with a numerical value that has a certain meaning, which is called the weight of the edge. The graph with weighted edges is called a weighted graph, also known as a network.
Dense graph, sparse graph A graph with few edges is called a sparse graph, and the opposite is called a dense graph. Generally, when G satisfies |E|<|V|log|V|, G is regarded as a sparse graph.
Paths, path lengths and loops Path: a path from vertex vp to vq refers to the vertex sequence vp,vi1...vim,vq, The number of edges on a path is called the path length A path whose first and last vertices are the same is called a cycle or cycle If a graph has n vertices and more than n-1 edges, then the graph must have a cycle.
Simple path, simple loop In a path sequence, a path whose vertices do not appear repeatedly is called a simple path; Except for the first vertex and the last vertex, the cycle in which the remaining vertices do not stop from that graph is called a simple cycle.
distance If the shortest path from vertex u to vertex v exists, then the length of this path is the distance from u to v; There is no path from u to v at all, then the distance is recorded as ∞
directed tree A directed graph in which the in-degree of one vertex is 0 and the out-degree of the remaining vertices is 1 is called a directed tree.
Graph storage
adjacency matrix method
It refers to using a one-dimensional array to store the relationship between vertices in the graph, and using a two-dimensional array to store the edge information in the graph (the adjacency relationship between vertices). The two-dimensional array that stores the adjacency relationship between vertices is called an adjacency matrix. The storage structure of the entire graph G: #define MaxVertexNum 100 typedef char VertexType; typedef int EdgeType; typedef struct{ VertexType Vex[MaxVertexNum]; //Vertex table EdgeType Edge[MaxVertexNum][MaxVertexNum];//adjacency matrix, edge table int vexnum,arcnum; }MGraph;
Notice: 1. In simple applications, a two-dimensional array can be directly used as the adjacency matrix of the graph. 2. In an undirected graph, the adjacency matrix is a symmetric matrix (unique) and can be compressed and stored. 3. When the adjacency matrix only indicates whether the edge exists, ElemType can adopt the enumeration type of 0 and 1. 4. The space complexity of adjacency matrix representation is O(n^2), n is the number of vertices |V|
Features of adjacency matrix representation: 1. The number of non-zero elements (or non-∞) in row i (or column j) of an undirected graph represents the degree TD(v) of the node The number of non-zero elements (or non-∞) in the i-th row and j-th column of the directed graph represents the out-degree OD(v) and in-degree ID(v) of the node. 2. Using an adjacency matrix to store a graph, it is easy to determine whether there is an edge connection between any two vertices in the graph; but to determine how many edges there are in the graph, you need to detect each element by row and column, which is very time-consuming. 3. Dense graphs are suitable for storage representation using adjacency matrices. 4. Assume that the adjacency matrix of graph G is A, and the element A^n[i][j] of A^n is equal to the number of paths of length n from vertex i to vertex j.
adjacency list method
When a graph is a sparse graph, using the adjacency list method saves a lot of space; the graph adjacency list method combines sequential storage and chain storage methods Create a singly linked list for each vertex in the graph. The node in the i-th singly linked list is attached to the edge vi of the vertex. This singly linked list is the edge list of vertex vi (a directed graph is called an outgoing edge list) (the edge list is "Edge" is really amazing) Each node in the edge table represents an edge, but it stores a vertex number and omits another vertex number (in the vertex table) The head pointer of the edge table and the vertex data information of graph G are stored sequentially (called a vertex table); #define MaxVertexNum 100 typedef struct ArcNode{//edge table node int adjvex;//The position of the vertex pointed by the arc struct ArcNode *next; //Info Typeinfo; //The edge weight of the network }ArcNode; typedef struct VNode{ VertexType data;//Vertex information struct VNode *first;//Pointer to the first arc attached to the vertex }VNode, AdjList[MaxVertexNum]; typedef struct{ AdjList vertices;//adjacency list int vexnum,arcnum;//The number of vertices and arcs of the graph }ALGraph
Features: 1. For an undirected graph, the required storage space is O(|V| 2|E|); (each edge appears twice in the adjacency list) For a directed graph, the required storage space is O(|V| |E|); 2. For sparse graphs, using adjacency lists can greatly save storage space. 3. In the adjacency list, given a vertex, it is easy to find all its edges; in the adjacency matrix, one row needs to be scanned, O(n) If you want to determine whether there is an edge between a given two vertices, you can quickly find it in the adjacency matrix (a bit random access); while the adjacency matrix needs to find another node in the edge table corresponding to the corresponding node, which is efficient. lower. 4. In a directed graph, finding the out-degree of a vertex only requires calculating the number of nodes in its adjacent adjacency list, but finding the out-degree requires traversing all adjacency lists; therefore, the inverse adjacency list can be used to speed up the in-degree of a vertex. solution. 5. The adjacency list of the graph is not unique, and the connection order of edge nodes is arbitrary.
cross list method
A cross-linked list is a linked storage structure of a directed graph. Each arc in the graph has a node. The arc node has 5 fields: (tailvex, headvex, hlink, tlink, info) tailvex and headvex are respectively the originating and inserting vertices of the edge; hlink and tlink respectively connect the edge nodes of the same originating and inserting vertices. The vertex node has 3 fields: (data, firstin, firstout) The cross-linked list is not unique, but a cross-linked list represents a certain graph In the cross linked list, it is easy to find the in-degree of V and the out-degree of V.
adjacency multiple list method
Adjacency multiple list is a chain storage structure of undirected graph Edge nodes (mark,ivex,ilink,jvex,jlink,info) mark is a flag field, which can mark whether this edge has been searched; ivex and jvex are the positions of the two vertices attached to the edge in the graph; ilink points to the next edge attached to ivex; jlink points to the next edge attached to the vertex jvex edge; info is a pointer field pointing to various information related to the edge vertex(data,firstedge)
Basic operations on graphs
Adjacent(G,x,y);Neighbors(G,x);InsertVertex(G,x);DeleteVertex(G,x); AddEdge(G,x);RemoveEdge(G,x,y);FirstNeighbor(G,x);NextNeighbor(G,x); Get_edge_value(G,x,y);Set_edge_value(G,x,y,v);
Graph traversal Starting from a certain vertex, visiting all vertices once and only once is the basis for solving connectivity problems, topological sorting, and critical path algorithm problems; Different from the tree, to avoid the same vertex being visited multiple times, it needs to be recorded after traversing a vertex, and the auxiliary array visited[]; Graph traversal includes BFS and DFS Using adjacency matrix, the time complexity is O(n^2); using adjacency list, the time complexity is O(|V| |E|); BFS\DFS requires space O(n) (The time and space complexity has nothing to do with the method selection of BFS\DFS, it mainly depends on the storage method)
breadth first search Tree-like hierarchical traversal Access layer by layer, so an auxiliary queue is needed bool visited[MAV_VERTEX_NUM]; void BFSTraverse(Graph G){ for(int i=0;i<G.vexnum;i) visited[i] = FALSE; InitQueue(Q); for(int i=0;i<G.vexnum;i) if(!visited[i]) BFS(G,i); } void BFS(Graph G,int v){ visit(v); visited[v] = TRUE; Enqueue(Q,v); while(!isEmpty(Q)){ for(w=FirstNeighbor(G,v);w>=0;w=NextNeighbor(G,v,w)) if(!visit[w]){ visit(w); visited[w] = TRUE; Enqueue(Q,w); } } }
BFS solves the single source shortest path problem void BFS_MIN_Distance(Graph G,int u){ for(i=0;i<G.vexnum;i) //d[i] represents the shortest path from u to i d[i]=∞;//Initialize path length visited[u]=TRUE; d[u]=0; EnQueue(Q,u); while(!isEmpty(Q)){ DeQueue(Q,u); for(w=FirstNeighbor(G,u);w>=0;w=NextN eighbor(G,u,w)) if(visited[w]){ visited[w]=TRUE; d[w]=d[u] 1;//path length 1 EnQueue(Q,w); } } }
breadth-first spanning tree The adjacency matrix is stored uniquely—its breadth-first spanning tree is unique The adjacency list storage is not unique—its breadth-first spanning tree is not unique
depth first search
Similar to preorder traversal of a tree bool visited[MAX_VERTEX_NUM]; void DFSTraverse(Graph G){ for(v=0;v<G.vexnum;v) visited[v] = FALSE; for(v=0,v<G.vexnum;v) if(!visited[v]) DFS(G,v); } void DFS(Graph G,int v){ visit(v); visited[v] = TRUE; for(w=FirstNeighbor(G,v);w>=0;w=NextNeighbor(G,v,w)) if(!visited[w]){ DFS(G,w); } }
Depth-first spanning trees and forests Connected graphs can generate deep spanning trees, and unconnected graphs can generate forests.
graph connectivity
Graph traversal algorithms can be used to determine the connectivity of the graph. If an undirected graph is connected, starting from any vertex, all vertices in the graph can be accessed by just one traversal; If a directed graph is connected and there is a path from the initial point to every vertex in the graph, then all vertices in the graph can be accessed.
Application of diagrams
minimum spanning tree If an edge is cut off, the spanning tree will become a disconnected graph; If an edge is added, a loop will be formed along the way; The minimum spanning tree is special. It is required to be the spanning tree with the smallest sum of edge weights in the spanning tree set; When the weights of each edge are different, the minimum spanning tree is unique; in general, the minimum spanning tree is not necessarily unique. The sum of weights of the edges of the minimum spanning tree is always unique and minimum. GENERIC_MST(G){ T=NULL; while T does not form a spanning tree; do Find a minimum cost edge (u, v) and no loop will be generated after adding T; T=T∪(u,v); }
Prim algorithm (BFS) Similar to Dijkstra, initially pick a vertex from the graph and add it to the tree T, then select a vertex closest to the current set of vertices in T, and add the vertex and edge to the tree T; the number of vertices in T after each operation and the number of edges are increased by 1; until all vertices in the graph are added to the tree T, T is the minimum spanning tree. There must be n-1 edges in T voidPrim(G,T){ T=empty set; U={W}; while((V-U)!=empty set){//If the tree does not contain all vertices Let (u, v) be the edge that makes u∈U and v∈(V-U) and has the smallest weight; T=T∪{(u,v)};//Edges are classified into trees U=U∪{v};//Vertices are classified into trees } } The time complexity of Prim's algorithm is O(|V^2|) and does not depend on |E|, so it is suitable for solving the minimum spanning tree of graphs with dense edges.
Kruskal algorithm Different from Prim's algorithm, which expands from the vertices to generate a minimum spanning tree, Kruskal's algorithm selects appropriate edges in increasing order of weights to construct a minimum spanning tree. Initially, each vertex forms a connected component. Then, in order of edge weights from small to large, the edge that has not been selected yet and has the smallest weight is continuously selected. If the vertex attached to the edge falls on a different connected component of T component, add this edge to T, otherwise discard this edge; until all vertices in T are on a connected component void Kruskal(V,T){ T=V; //Initialize T, only vertices numS=n;//Number of connected components while(numS>1){//If the number of connected components is greater than 1 Take the edge (v, u) with the smallest weight from E if (v and u belong to different connected components in T) T=T∪{(v,u)}; numS--; } } } Kruskal's algorithm uses a heap to store the set of edges, so it only takes O(log|E|) time to select the edge with the smallest weight each time; each time adding a new edge is similar to the process of solving an equivalence class, and the union search method can be used Data structure to describe T, the time complexity is O(|E|log|E|), so Kruskal is suitable for graphs with sparse edges and many vertices.
shortest path The BFS traversal in the previous figure to solve the shortest path is for unweighted graphs. Here we discuss the shortest path with the shortest weighted path length; The algorithms for solving the shortest path all rely on a property: the shortest path between two points also includes the shortest paths between other vertices on the path.
Dijkstra's algorithm (BFS) Single source shortest path; Dijkstra's algorithm sets a set S to record the vertices of the shortest path that has been obtained. Initially, the source point v0 is placed in S. Each time a new vertex vi is incorporated into the set S, the source point v0 must be modified to the current shortest vertex in the set V-S. Lujin length value; negative weight values on edges are not allowed Set up two auxiliary arrays: dist[]: Record the current shortest path length from the source point v0 to other vertices. Initial state: If there is an arc from v0 to vi, then dist[i] is the weight of the arc; otherwise, set dist[i] to ∞. (dist[] only goes back one step at a time - BFS idea) path[]: path[i] represents the predecessor node of the shortest path from the source point to vertex i. At the end of the algorithm, the shortest path from the source point v0 to the vertex vi can be traced. When solving manually, draw a dict array. The most important thing is to add the vertex to the set S every time you select it, update the path distance of each point, and then take the next step starting from the last vertex of the set S. Time complexity O(|V|^2)
Floyd algorithm The shortest path between each pair of vertices Recursion produces an n-order square matrix sequence: A^-1,A^0,..A^N; where A^k[i][j] represents the length from vertex i to j, and the kth vertex is used as Relax the middle vertices The time complexity is O(|V|^3) (equivalent to running Dijkstra's algorithm once for each vertex O(|V^2|)*|V|), Edges with negative weights are not allowed and can act on directed graphs\undirected graphs (converted into directed graphs with the same bilateral sides)
Directed acyclic graph description expression Directed acyclic graph DAG (Directed acycle graph): There is no cycle in a directed graph. It is an effective tool for describing expressions containing common subexpressions. For example, a string of infix expressions can be represented by a binary tree (the root node stores operators, and the subtree stores operands). There are repeated subexpressions in it. By using a directed acyclic graph, the same subexpressions can be shared.
topological sequence A sequence composed of vertices of a directed acyclic graph, each vertex appears only once, and if vertex A is ranked in front of vertex B in the sequence, then there is no path from B to A in the graph. It can also be understood as: a topological sequence is an ordering of the vertices of a directed acyclic graph. Each AOV net has one or more topologically sorted sequences.
AOV network (Activity on vertex network) If a DAG is used to represent a project, its vertices represent activities, and the directed edge <Vi, Vj> represents a relationship that Vi must precede Vj, then this kind of graph is called a network with vertices representing activities. There are many algorithms for topological sorting of AOV networks, one of the commonly used algorithms is: 1. Select a vertex without a predecessor from the AOV network and output it 2. Delete the vertex and all directed edges starting from it from the network 3. Repeat step 1.2 until the AOV network is empty or there are no predecessor-less vertices in the current network (indicating that there must be a cycle in the graph) Since outputting each vertex requires deleting its starting edge, the time complexity of topological sorting is O(|V| |E|) In addition, topological sorting can also be implemented using DFS (starting from the vertices with an in-degree of 0 and an out-degree of non-0 one by one, and it will fail as long as backtracking occurs until a stroke is completed) reverse topological sort Start from the vertex with out-degree 0 The AOV network uses adjacency matrix storage. If it is a triangular matrix, there must be a topological sequence, and vice versa.
bool TopologicalSort(Graph G){ InitStack(S); for(int i=0;i<G.vexnum;i) if(indegree[i]==0) Push(S,i);//Push all vertices with in-degree 0 onto the stack int count=0; while(!isEmpty(S)){ Pop(S,i); print[count]=i;//Output vertices for(p=G.vertices[i].firstarc;p;p=p->nextarc){ //Decrease the in-degree of all vertices pointed by i by 1, and push the vertices whose in-degree is reduced to 0 onto the stack v = p->adjvex; if(!(--indegree[v])) Push(S,v); } if(count<G.vexnum) return false; else return true; }
Critical Path
AOE network (Activity on edge network) Events are represented by vertices, activities are represented by directed edges, and the cost of completing the activity is represented by the weight on the edge (such as the time required to complete the activity) Two properties of AOE network: 1. Only after the event represented by a vertex occurs, the activities represented by the directed edges starting from the vertex can begin. 2. The event at a vertex can only occur after the activities represented by the incoming edges of a vertex are completed. There is only one vertex with an in-degree of 0 in the AOE network, called the start vertex (source point), which represents the beginning of the entire project; the AOE network also has only one vertex with an out-degree of 0, called the end vertex (sink). Represents the end of the entire project. Some activities in the AOE network can be carried out in parallel, but all activities on all paths cannot end until they are completed. Therefore, among all paths from the source to the sink, the path with the largest path length is called the critical path. The activities on the path are called critical activities (finding the critical activities is equivalent to finding the critical path). The shortest time to complete the entire project is the length of the critical path.
Find key activities 1. The earliest occurrence time of event vk ve(k) It refers to the longest path length from source point v1 to vertex vk. The earliest occurrence time of event vk is the earliest time that all activities starting from vk can start. Let vk be any successor of vj if(ve[j] Weight(vj,vk) > ve(k)) then ve[k]=ve[j] Weight(vj,vk) "Tension operation" When calculating the ve() value, proceed from front to back, and can be calculated based on topological sorting. 2. The latest occurrence time vl(k) of event vk It refers to the latest time vl(k) that k can occur without delaying the completion of the entire project. When calculating the vl() value, proceed from back to front, and can be calculated based on the inverse topological sequence. 3. The earliest start time e(i) of activity ai 4. The latest start time of activity ai is l(i) 5. The difference between the latest start time l(i) of an activity ai and its earliest start time e(i) d(i)=l(i)-e(i) It refers to the time margin for the completion time of the activity. To put it bluntly, it means how long the AI can delay. If d(i)=0, then i is the key activity The critical path in the network is not unique, and for networks with several critical paths, increasing the speed of key activities on one critical path cannot shorten the duration of the entire project. This can only be achieved by accelerating the key interactions included on all critical paths. The purpose of shortening the construction period
The understanding of the solution vl(k): ve(k) is the most diligent move from the source point to the sink point; vl(k) is the move from the sink point to the source point, and the move closest to the sink point is selected (also Just walk straight ahead, which is the laziest way, just drag it if you can)
7Find
Basic concepts of search
Search: The process of finding data elements that meet certain conditions in a data set Lookup table (lookup structure): a collection of data used for lookup; there are four commonly used operations on lookup tables 1. Query whether a specific data element is in the lookup table 2. Retrieve a specific data element that meets the conditions 3. Insert a data element into the lookup table 4. Delete a data element from the lookup table Static lookup table: a lookup table that only involves the above one and two steps, no need to dynamically modify the lookup table (sequential lookup, half lookup, hash lookup, etc.); dynamic lookup table (binary sort tree lookup, hash lookup, etc.) Keyword: The value of a data item in a data element that uniquely identifies the element Average search length: During the search process, the length of a search refers to the number of keywords that need to be compared. The average search length is the average number of keyword comparisons in all search processes.
Practice ASL when searching for success and failure
Sequential search and binary search
sequential search Also known as linear search, it is suitable for both sequential lists and linked lists. Advantages: There are no requirements for data element storage and ordering; Disadvantages: When n is large, efficiency is low
Sequential search of general linear tables typedef struct{//data structure of lookup table ElemType *elem; //Element storage space base address int TableLen;//The length of the table }SSTable; int Search_Seq(SSTable ST,ElemType key){ ST.elem[0] = key;//Sentinel for(i=ST.TableLen;ST.elem[i]!=key;--i);//Look from back to front return i; } In the above algorithm, ST.elem[0] is called a sentinel, so that the internal loop of Search_Seq does not have to judge whether the array will go out of bounds, because the loop will definitely jump out when i==0 is satisfied. The introduction of sentinels can avoid many unnecessary judgment statements, thereby improving program efficiency.
Sequential search in ordered list If it is known before the search that the keywords in the table are in order, then after the comparison fails, there is no need to compare to the other end of the table to return the search failure, thus reducing the average search length of the search failure. A decision tree can be used to describe the search process of an ordered linear list. The circular nodes represent the elements that exist in the ordered linear list; the rectangular nodes are failure nodes (your own fictitious empty nodes); if there are n nodes, then Correspondingly, there are n 1 nodes where the search failed. The average search length (ASL) of failed search is (1 2 .. n n)/(n 1); the probability of finding each 'failed node' is 1/n 1
The difference between disordered and ordered sequential search is only the difference when the search fails.
ASL success=(n 1)/2
Half search (binary search) It is only suitable for sequential storage structures, not suitable for chain storage structures, and requires elements to be arranged in order by keywords. The time is O(log2n)
int Binary_Search(SeqList L,ElemType key){ int low=0,high=L.TableLen-1,mid; while(low<=high){ mid = (low high)/2; if(L.elem[mid]==key) return mid; else if(L.elem[mid]<key){ low = mid 1; // mid = (low high)/2; } else{ high = mid-1; // mid = (low high)/2; } } return -1; }
Block search (index order search) It absorbs the respective advantages of sequential search and binary search. It has a dynamic structure and is suitable for fast search; The basic idea of block search is to divide the lookup table into several sub-blocks. The elements within a block can be unordered, but the blocks are ordered (the largest keyword in the first group is smaller than all the keywords in the second block, and so on); then create an index table to index each element in the table Contains the maximum key of each block and the address of the first element in each block; the index table is ordered by key. The process of block search: 1. Determine the block where the record to be searched is located in the index table (sequential or half search) 2. Sequential search within the block Blocked search average search length: sum of the average length of index search and intra-block search
B-tree and B-tree
B-tree (multi-way balanced search tree) (improved version of BST) The maximum number of children of all nodes in the B-tree is called the order of the B-tree, usually represented by m. A B-tree of order m is either an empty tree or satisfies the following characteristics: 1. Each node in the tree has at most m subtrees, that is, it contains at most m-1 keywords. 2. If the root node is not a terminal node, there are at least two subtrees 3. All non-leaf nodes except the root node have at least m/2 upper bound subtrees, that is, they contain at least m/2 upper bound-1 keywords. 4. The structure of all non-leaf nodes is as follows (n, P0, K1, P1, K2, P2,..., Kn, Pn) Ki is a keyword and satisfies K1<K2<..<Kn; Pi is a pointer to the root node of the subtree. All nodes in the subtree pointed to by the Pi pointer are greater than Ki and less than Ki 1; n is the keyword in the node. number of 5. All leaf nodes appear at the same level and carry no information. They actually do not exist. The pointers to these nodes are empty.
B tree height The number of disk accesses required for most operations in a B-tree is proportional to the height of the B-tree. Each node in the tree has at most m subtrees and m-1 keywords, and the number of keywords should satisfy n≤m^h-1; The h1th layer has at least 2(m/2 upper bound)^(h-1) nodes, and the h1th layer is a leaf node that does not contain any information; For a B-tree with n keywords, the node where the leaf node search fails is n 1, so n 1>=2 (m/2 takes the upper bound)^(h-1), that is, h≤log (m/2 takes the upper bound)((n 1)/2 1)
B-tree search 1. Find nodes in B-tree 2. Find keywords within the node The search operation of 1. is performed on the disk, and the search operation of 2. is performed in the memory; After finding the target node, read it into the memory, and then use the sequential search method or the binary search method within the node.
Insertion into B-tree Inserting directly after being unable to find the location will destroy the requirements in the B-tree definition. 1. Positioning: Find the non-leaf node in the lowest layer (find the leaf node, take the non-leaf node of the previous layer) 2. Insertion: The number of keywords in each non-failed node is within the interval [m/2 takes the upper bound -1, m-1] and can be directly inserted; When the number of inserted node keywords is greater than m-1, the node is split. Splitting method: take the upper bound from the middle position m/2 and divide the keyword into two parts. The left part is included in the original node, and the right part is placed in the new node; the knot at the middle position (m/2 takes the upper bound) point to insert the parent node of the original node; and so on.
Deletion of B-tree When the deleted keyword k is not in the terminal node (the lowest non-leaf node), k can be replaced by k's predecessor (successor) k', and then k' is deleted in the corresponding node and converted into a keyword The situation in the terminal node; When the deleted keyword is at the terminal node (the lowest non-leaf node), there are three situations: 1. Delete keywords directly: the number of keywords ≥ m/2 takes the upper bound -1 2. Brothers are enough: the number of keywords = m/2, take the upper bound -1, and the number of its left and right (adjacent) sibling keywords is greater than or equal to m/2, take the upper bound, then its brothers, parents, and itself, use Father-son transposition method to achieve balance 3. There are not enough brothers: the number of deleted keywords = m/2 and the upper bound is -1. At this time, the nodes of its left and right (adjacent) brothers are also = m/2 and the upper bound is 1. Then the keywords will be After deletion, it is merged with the keywords in the left (right) sibling node and the parent node. The number of keywords in the parent node is reduced by one. The keywords in the parent node as the root node can be directly reduced to 0 (delete the root node). The new node is called the root.
B-tree related properties It is necessary to grasp the relationship between the number of keywords and the number of subtrees: the number of keywords is 1 less than the number of subtrees. All non-terminal nodes except the root node have at least m/2 upper bound subtrees (m/2 upper bound - 1 keyword) Keywords in nodes are ordered in ascending order from left to right. That is, m/2 takes the upper bound -1≤n≤m-1, that is, [m/2 takes the upper bound -1, m-1]
B-tree
B-tree is a deformed tree of B-tree that appears in response to the needs of the database. An m-order B-tree satisfies the following conditions: 1. Each branch node has at most m subtrees 2. The non-leaf root node has at least two subtrees 3. The number of subtrees of a node is equal to the number of keywords 4. All leaf nodes contain all keywords and pointers to corresponding records. The keywords are arranged in size order in the leaf nodes, and adjacent leaf nodes are connected to each other in size order.
Main differences from B-trees: 1. A node with n keywords in the B tree has n subtrees, and each keyword corresponds to a subtree; n keyword nodes in the B tree contain n-1 subtrees. 2. The range of the number n of keywords for each node in the B tree is [m/2 takes the upper bound, m] (relative to the upper and lower bounds of the number n of node keywords in the B tree plus 1); root node B: [1,m],B:[1,m-1] 3. In tree B, leaf nodes contain information (all keywords), and non-leaf nodes only serve as indexes. 4. Every search in B-tree, whether successful or not, is a path from the root node to the leaf node.
hash table
Basic concepts of hash tables
Due to the previous linear table and tree search, there is no definite relationship between the position of the record in the table and the key of the record. Therefore, when searching for records in these tables, a series of keywords need to be compared. This search method is based on comparison, and the efficiency of the search depends on the number of comparisons. Hash function: The function Hash(key)=Addr that maps the keywords in the lookup table to the address corresponding to the keyword, that is, using the characteristics of the keyword itself and using as little "comparison" search as possible. The hash function may map multiple different keywords to the same address, which is called a 'collision'. These different keywords that collide are called synonyms. On the one hand, a well-designed hash function will minimize conflicts; on the other hand, conflicts are inevitable, and methods to deal with them must be designed. Hash table: A data structure that is directly accessed based on keywords. In other words, the hash table establishes a direct mapping relationship between keywords and storage addresses. Ideally, the time complexity of searching a hash table is O(1), which means it has nothing to do with the number of elements.
How to construct a hash function
1. The definition domain of the hash function must include all storage keys, and the value range depends on the size or address range of the hash table, 2. The addresses calculated by the hash function should be evenly distributed in the entire address space with equal probability, thereby reducing the occurrence of conflicts. 3. The hash function should be as simple as possible and can calculate the hash corresponding to any keyword in a short time.
1. Direct addressing method Directly take the value of a linear function of the keyword as the hash address. H(key)=key or H(key)=ax key b It is consistent with the situation that the keyword distribution is basically continuous; if the keyword distribution is discontinuous and there are many empty spaces, it will cause a waste of storage space.
Conflict handling methods
Hash lookup and performance analysis
8 sort
Basic concepts of sorting
Stability of the algorithm: There are two elements Ri and Rj in the list to be sorted, and their corresponding keywords are the same keyi=keyj. If the relative positions of Ri and Rj remain unchanged after using a certain sorting algorithm, then the sorting algorithm is stable, otherwise it will not. Stable; stability does not reflect the quality of the algorithm, but only describes the nature of the algorithm. Sorting algorithms can be divided into two categories according to whether the data elements are completely in memory during the sorting process: 1. Internal sorting: During sorting, all elements are stored in memory for sorting. 2. External sorting: During sorting, all elements cannot be stored in memory at the same time. They must be continuously moved between internal and external memory according to requirements during the sorting process. In general, internal sorting requires comparison and movement, but not all internal sorting requires comparison, such as radix sorting
insertion sort Each time a record to be sorted is inserted into the previously captured subsequence according to the size of the keyword.
direct insertion sort
Ordered sequence L[1..i-1]L(i) Unordered sequence L[i 1…n] void InsertSort(ELemType A[],int n){ int i,j; for(i=2;i<=n;i){//Insert the following n-1 numbers to the front if(A[i]<A[i-1]){ A[0] = A[i];//Copy as sentinel, A[0] does not store any elements for(j=i-1;A[0]<A[j];j--){//Find the insertion position from back to front A[j 1] = A[j]; A[j 1] = A[0]; } } Space complexity O(1), space complexity O(n^2) Best case: the elements in the table are already sorted, time O(n) Worst case: the order of the elements in the table is reversed, time O(n^2) Since each comparison is from back to front, it is stable. Suitable for sequential lists and linked lists (you can find the specified element position from front to back) Suitable for basically ordered sorting tables with small amount of data
half insertion sort
void InsertSort(ElemType A[],int n){ int i,j,low,high,mid; for(i=2;i<=n;i ){ A[0] =A[i]; low=1;high=i-1; while(low<=high){ mid = (low high)/2; if(A[mid]>A[0])high=mid-1; else low=mid 1; } for(j=i-1;j>=high 1;j--) A[j 1] = A[j]; A[high 1]=A[0]; } } Time complexity O(n^2) The number of comparisons has nothing to do with the initial state of the list to be sorted, and only depends on the number n of elements in the list; The number of moves is related to the initial state of the list to be sorted
Hill sort
First divide the sorting table into several sub-tables (records with the same increment apart form a sub-table di 1=di/2, and the last increment is equal to 1), perform direct insertion sorting on each sub-table, when the entire table When the elements in are basically in order, a direct insertion sort is performed on all records. void ShellSort(ELemType A[],int n){ //A[0] is a temporary storage unit, not a sentinel. When j<=0, the insertion position is reached for(dk=n/2;dk>=1;dk/=2)//step size change for(i=di 1;i<=n;i) if(A[i]<A[i-dk]){//A[i] needs to be inserted into the ordered incremental subtable A[0]=A[j];//temporarily stored in A[0] for(j=i-dk;j>0&&A[0]<A[j];j-=dk) A[j dk]=A[j]; } } There is no definite conclusion on the time complexity, but it is believed to be O(n^2) Sorting instability Suitable for sequence table
swap sort Swap the positions of the two records in the sequence based on the comparison results of the keywords
Bubble Sort
Compare the values of adjacent elements from back to front (from front to back), and exchange them if they are in reverse order. Keywords gradually float to the surface like bubbles void BubbleSort(ELemType A[],int n){ for(int i=0;i<n-1;i){//n-1 times flag = false; for(j=n-1;j>i;j--) if(A[j-1]>A[j]){//If it is reverse order swap(A[j-1],A[j]); flag = true; } if(flag==false)//The flag of the end of bubble sorting: not a single exchange return; } } Space O(1), time O(n^2) Stability: elements will not be exchanged when they are equal, stable The sequence produced by bubble sorting is globally ordered, which is not equivalent to the local ordering of insertion sort.
Quick sort Based on divide and conquer thinking
Pick any element in the list to be sorted as pivot (usually the first element), and divide the sorted sequence into two independent parts L[1..k-1] L(k) L[k 1, ...n], so that the first half is less than pivot, the second half is greater than pivot, and pivot is placed at the final position L(k). This process is called one-pass quick sort (one-pass division) void QuickSort(ELemType A[],int low,int high){ if(low<high){//Condition for recursive jump out int pivotpos = Partition(A,low,high);//Partition QuickSort(A,low,pivotpos-1); QuickSort(A,pivotpos 1,high); } } The performance and key of the quick sort algorithm come from the partition operation. There are many versions of the partition operation. Here, the first element is used as the pivot pivot for partitioning. int Partition(ElemType A[],int low,int high){ ElemType pivot=A[low]; while(low<high){//Loop break condition while(low<high && A[high]>=pivot) --high; A[low]=A[high]; whilie(low<high && A[low]<=pivot) low; A[high] = A[low]; } A[low] = pivot; return low; } Space: Recursive work stack is required, preferably O(log2n), average O(log2n) (split in half as much as possible, like a complete binary tree), Worst case O(n) (the number of elements in the two divided areas is n-1 and 0, n-1 recursive calls are required, and the stack depth is O(n)). Time: The running time of quick sort is related to whether the division is symmetrical. The worst case is half n-1 and half 0. The maximum asymmetry occurs at each level of recursion, which corresponds to the initial ordered sequence (sequential or reverse order) , is O(n^2) Stability: Unstable, the same elements will be transferred to different parts and the relative positions will be changed. Note: Quick sort does not produce an ordered subsequence, but the pivot element is placed at the final position in each pass. Improve efficiency: 1. Select pivot elements that can divide the sequence as much as possible 2. Randomly select pivot elements The most ideal division: both word problems are smaller than n/2, the time is O(nlog2n), and the running time of quick sort is very close to the best case on average. Quicksort is the sorting algorithm with the best average performance among all internal sorting algorithms.
selection sort In each pass, the smallest element of the keyword in the original sequence is selected as the i-th element of the ordered subsequence until n-1 passes
Simple selection sort
void SelectSort(ElemType A[],int n){ for(i=0;i<n-1;i)//A total of n-1 times min=i;//Record the minimum element position for(j=i 1;j<n;j) if(A[j]<A[min]) min = j; if(min!=i) swap(A[i],A[min]); } } Space O(1), time O(n^2) Stability: It may cause the relative position of keywords containing the same elements to change, which is unstable.
Heap sort
The heap is an array object of a tree. A one-dimensional array can be regarded as a complete binary tree, which satisfies the properties 1.L(i)>=L(2i) and L(i)>=L(2i 1) or 2.L(i)<=L(2i) and L(i)<=L(2i 1) ( 1<=i<=n/2 (take the bound); Those that satisfy 1. are regarded as large root heaps (large top heaps), and those that satisfy 2. are regarded as small root heaps (small top heaps). Heap sorting: First, n elements in the array are built into an initial heap; after outputting the top element of the heap, the bottom element of the heap is sent to the top of the heap. At this time, the root node no longer meets the properties of a large root heap, and the heap is destroyed. The top element of the heap is moved to the top of the heap. Adjust it downwards so that it can continue to maintain the nature of a large top pile. Then output the top element of the tower, and repeat until there is only one element left in the heap. Heap sort algorithm: void HeapSort(ElemType A[],int len){ BuildMaxHeap(A,len); for(i=len;i>1;i--){ Swap(A[i],A[1]);//Output the top element of the heap and exchange it with the bottom element of the heap HeadAdjust(A,1,i-1);//Arrange the remaining i-1 elements into a pile } } Heap sorting is suitable for situations where there are many keywords (in the order of hundreds of millions) Example: To select the top 100 maximum values among 100 million numbers, first use an array of 100, read the first 100 numbers, and construct a small top heap, and then read the remaining numbers in sequence. If they are less than the top of the heap, discard them. Otherwise, replace the top of the heap with this number and resize the heap. After the data is read, the 100 numbers in the heap are the required ones. Space efficiency O(1), time efficiency O(nlog2n) unstable
1. How to construct an unordered sequence into an initial heap Filter the subtree with the (n/2 bounded)-th node as the root (if it is a large root heap, compare the root node keyword and the size of its left and right subtree keywords, and exchange them if they do not meet the large root heap rules), Make this subtree a heap. Then, the subtrees rooted at each node (n/2 minus the bound -1,1) are filtered forward in sequence. When viewed as a complete binary tree, the root node is gradually adjusted from bottom to top (keys are exchanged between the root node and the child nodes); in the array, the root node is adjusted from right to left (keywords are exchanged) void BuildMaxHeap(ElemType A[],int len){ for(int i=len/2;i>0;i--)//From i=n/2 to 1, adjust the heap repeatedly HeadAdjust(A,i,len); } void HeadAdjust(ElemType A[],int k,int len){ //HeadAdjust adjusts the subtree with element k as the root A[0]=A[k]; for(i=2*k;i<=len;i*=2){//Filter down along the child nodes with larger key if(i<len && A[i]<A[i 1]) i ;//Get the subscript of the child node with a larger key if(A[0]>=A[i]) break;//End of filtering else{ A[k] = A[i];//Adjust A[i] to the parent node k=i;//Modify the k value to continue filtering downwards } } A[k]=A[0]; } The adjustment time is related to the tree height, O(h), and the time complexity is O(n). An unordered number can be built into a heap in linear time.
2. After outputting the top element of the heap, how to adjust the remaining elements into a new heap? After the top element of the heap is output, the last element of the heap is exchanged with the top element of the heap. At this time, the properties of the heap are destroyed, and downward filtering is required. Filter from top to bottom (exchange root node keywords and subtree keywords)
Heap insertion Insert the tail and adjust from bottom to top
Merge sort and radix sort
merge sort The meaning of merging is to combine two or more ordered lists into a new ordered list
Assuming that the table to be sorted contains n records, it can be regarded as n ordered sub-tables, each with a length of 1, and then merged in pairs to obtain n/2 upper bounded records with a length of 2 or 1. Sequence list; continue merging two by two until merged into an ordered list of length n. This method is called 2-way merge sort. ElemType *B=(ElemType*)malloc((n 1)*sizeof(ElemType));//Auxiliary array B void Merge(ElemType A[],int low,int mid,int high){ //Table A[low..mid]A[mid 1..high] are each ordered, merge them into an ordered list for(int k=low;k<=high;k) B[k]=A[k]; for(i=low;j=mid 1;k=i,i<=mid&&j<=high;k ){ if(B[i]<=B[j]) A[k]=B[i ]; else A[k]=B[j]; } while(i<=mid) A[k ]=B[i ]; while(j<=high) A[k ]=B[j ]; } void MergeSort(ELemType A[],int low,int high){ if(low<high){ int mid=(low high)/2; MergeSort(A,low,mid); MergeSort(A,mid 1,high); Merge(A,low,mid,high); } } Space efficiency: n units of auxiliary space, space complexity O(n) Time efficiency: Each merge is O(n), and log2n upper bound merges are required. The algorithm time complexity is O(nlog2n). Stability: The Merge() operation will not change the relative order of records with the same keywords, stable
Generally speaking, for k-way merging of N elements, the number of sorting times m satisfies k^m=N, so m=logkN, and because m is an integer, m=logkN takes the upper bound
Radix sort Use the idea of multi-keyword sorting to sort single logical keywords
Sort based on keyword size. The keyword of each node aj consists of d tuples, kd-1j is the primary keyword, and kj0 is the secondary keyword. To achieve multi-keyword sorting, there are usually two methods: 1. Most significant bit first (MSD), divide several smaller subsequences layer by layer according to the decreasing weight of keywords, and finally all subsequences are connected into an ordered sequence. 2. Lowest digit first (LSD), sorted in ascending order by keyword weight to form an ordered sequence. operate: 1. Distribution: Make the Q0...Qr-1 queue empty, and then add it to the corresponding queue according to the corresponding bit of each node. 2. Collection: Connect the queue nodes of Q0..Qr-1 end to end to obtain a new node sequence to form a new linear table. Example: To sort sequences below 1000, first determine the base r (can be regarded as r base). The base of 1000 is 10, so 10 chain queues are needed during the sorting process. Numbers below 1000 are three digits, so three "distribution" and "collection" operations are required; all records with the same lowest keyword (units digit) are assigned to a queue, and then the "collection" operation is performed. Space efficiency: The auxiliary storage space required for one sorting trip is r (r queues: r queue head pointers, r queue tail pointers), O(r) Time efficiency: d passes of allocation and collection, one pass of allocation requires O(n), one pass of collection requires O(r), the time complexity is O(d(n r)), regardless of the initial state of the sequence. Stability: Radix sorting itself requires that bitwise sorting must be stable, so it is stable!
Comparison and application of various internal sorting algorithms
Comparison of internal sorting algorithms
Application of internal sorting algorithm
1. If n is small, direct insertion sort or simple selection sort can be used. Since direct insertion sort requires more record moves than simple selection sort, when the record itself has a large amount of information, simple selection sort is better. 2. If the initial state of the file is basically ordered by keywords, it is appropriate to use direct insertion or bubble sorting. 3. If n is large, use the O(nlog2n) sorting method: quick sort, heap sort or merge sort. Quick sort is considered to be the best method of comparison-based internal sorting at present. When the keywords to be sorted are randomly distributed, quick sort has the shortest average time. Heap sort requires less auxiliary space than quick sort, and quick sort will not occur. worst case scenario. But both sortings are unstable. If a stable O(nlog2n) algorithm is required, merge sorting is used, but merging pairs from a single record is not recommended. They can usually be used in combination with direct insertion sorting: first use direct insertion sorting to obtain longer ordered subdivisions. files and then merge them two by two. Direct insertion sort is stable, and the improved merge sort is also stable. 4. In the comparison-based sorting method, after each comparison of two keyword sizes, only two possible transfers occur. Therefore, a binary tree can be used to describe the comparison decision process. It can be proved that: when there are n files When keywords are randomly distributed, any sorting algorithm that relies on "comparison" requires at least O(nlog2n) time. 5. If n is very large and the number of recorded keywords is small and can be decomposed, it is better to use radix sorting. 6. When the record itself has a large amount of information, in order to avoid spending a lot of time moving the record, a linked list can be used as the storage structure
external sort
Basic concepts of external sorting algorithms
Sorting performed in memory is called internal sorting. In many applications, large files need to be sorted, and the entire file cannot be copied into memory for sorting. Therefore, the records to be sorted need to be stored in external memory. During sorting, the data are transferred into the memory part by part for sorting. During the sorting process, multiple exchanges between memory and external storage are required. This sorting is called external sorting
external sorting method
The time cost during the external sorting process mainly considers the number of disk accesses, that is, the number of IOs External sorting usually uses merge sorting: 1. According to the size of the memory buffer, divide the files on the external storage into sub-files of length l, read them into the memory in sequence and use the internal sorting method to sort them, and write the ordered sub-files obtained after sorting back to the external storage. , these ordered sub-files are called merged segments or sequential strings; 2. Merge these merged segments one by one so that the merged segments gradually increase from small to large until the entire ordered file is obtained. One-pass merging refers to merging all segments of the current same level into one Total external sorting time = time required for internal sorting, external information reading and writing time, time required for internal merging The time for reading and writing external storage information is much longer than the time for internal sorting and internal merging. The reading and writing of external storage information is based on "disk blocks". One merge requires the total number of records/records of one disk block to be read and written several times. Therefore, the total number of reads and writes required: 2*total number of records/number of records in a disk block*n (number of times) total number of records/number of records in a disk block (internal sorting also requires a full read and write) For r initial merging segments, perform k-way balanced merging. The height of the tree (inverted k-ary tree) = logkr and the upper bound = the number of merging passes S It can be seen that increasing the number of merge paths k or reducing the number of initial merge segments r can reduce the number of merge passes S, thereby reducing the number of disk I/Os and improving the speed of external sorting.
Multi-way balanced merging and loser tree
Increasing the number of merging paths k can reduce the number of merging passes S, but will increase the internal merging time. When merging internally, selecting the smallest keyword among k elements requires k-1 comparisons, and n-1 keywords need to be compared. Each merge of n elements requires (n-1)(k-1) comparisons, S times. The total number of comparisons required for merging is S(n-1)(k-1)=log2r takes the upper bound (n-1)(k-1)/log2k takes the upper bound, where (k-1)/log2k takes the upper bound and grows with the growth of k. Therefore the ordinary internal merge sort algorithm cannot be used.
In order to prevent internal merging from being affected by the increase of k, the loser tree is introduced. The loser tree is a variant of tree sorting and can be regarded as a complete binary tree. The k leaf nodes respectively store the records of the k merge segments currently participating in the comparison during the merge process. The internal nodes are used to record the "losers" of the left and right nodes, and let the winners continue to compare up to the root node. The root node is the winner. S(n-1) log2k takes the upper bound = log2k takes the upper bound (n-1) log2k takes the upper bound = (n-1) log2r takes the upper bound After using the loser tree, the internal merge sort comparison order has nothing to do with k, but the number of merge paths k is restricted by the memory space, which may reduce the buffer capacity within a certain space. If the k value is too large, although the number of merge passes will be reduced, the number of IOs will still increase. ps: Compared with the winner tree, when reconstructing the winner tree, you need to compare with the sibling nodes and then change the parent nodes; when reconstructing the loser tree, you only need to go all the way up to compare the parent nodes.
Replacement-selection sort (generating initial merge segments)
In order to reduce r, r=n/l takes an upper bound, so we need to find a way to generate a longer initial merge segment l. Assume that the initial file to be sorted is FI, the initial merge segment output file is FO, the memory work area is WA, FO and WA are initially empty, and WA can accommodate w records. The steps of the replacement-selection algorithm are as follows: 1. Input w records from FI to workspace WA 2. Select the record with the minimum keyword value from WA and record it as MINIMAX 3. Record MINIMAX to FO 4. If FI is not empty, output the next record from FI to WA 5. Select the smallest keyword record from all records in WA with keywords greater than the keywords of the MINIMAX record as the new MINIMAX record. 6. Repeat steps 3-5 until no new MINIMAX can be selected in WA, obtain an initial merge segment, and output an end flag of the merge segment to FO. 7. Repeat steps 2-6 until WA is empty and all initial merged segments are obtained. The process of selecting MINIMAX records in WA requires the use of loser trees.
best merge tree
After the file is sorted by substitution-selection, initial merged segments of different lengths are obtained. How to organize the merging order of initial merging segments with different lengths to minimize the number of IOs? Draw the corresponding merge tree. The weighted path length WPL of the tree is the total number of records read during the recursive process. Extend the idea of Huffman tree to the case of k-ary tree. In the merge tree, let the initial merge segment with a small number of records be merged first, and let the initial merge segment with a large number of records be merged last. The total IO can be established The best merge tree with the least number of times. Method to build the best merge tree: If the initial merge segment is not enough to form a strict k-ary tree, a "virtual segment" with a length of 0 needs to be added. How to determine the number of virtual segments to add? A strict k-ary tree has n0=(k-1)nk 1. If (n0-1)%(k-1)=0, then k-fork merge book can be constructed with nk internal nodes. If (n0-1)%(k-1)=u!=0, then k-u-1 empty merge segments can be added to create a merge tree
The AOV network and the AOE network are both DAGs, and the edges and vertices of the two have different meanings. The edges in the AOV network have no weights and only represent the context between vertices; the edges in the AOE network have weights and represent the cost of completing the activity.
The difference between Dijkstra and Prim: The sum of all edges in Prim's minimum spanning tree must be the smallest, only the weights of adjacent nodes; the distance between two points in Dijkstra's MST (unit point shortest path tree) is not necessarily smaller than the original graph AB is shorter because the distance to the source point is added; the difference between the two lies in the relaxation operation. In addition, Prim can only be used for undirected graphs, and Dijkstra can have \ undirected graphs (undirected graphs are converted into directed graphs, ij bilateral)
The traversal of the adjacency matrix is unique, but the traversal of the adjacency list is not unique.
Connectivity is discussed in undirected graphs, and strong connectivity is discussed in directed graphs.