MindMap Gallery Data Structure Chapter 8 - Sorting
Chapter 8 of "Data Structure" - Sorting Notes, including insertion sort, exchange sort, external sort, comparison and application of various internal sort algorithms, merge sort and radix sort, etc.
Edited at 2022-11-23 16:08:11One Hundred Years of Solitude is the masterpiece of Gabriel Garcia Marquez. Reading this book begins with making sense of the characters' relationships, which are centered on the Buendía family and tells the story of the family's prosperity and decline, internal relationships and political struggles, self-mixing and rebirth over the course of a hundred years.
One Hundred Years of Solitude is the masterpiece of Gabriel Garcia Marquez. Reading this book begins with making sense of the characters' relationships, which are centered on the Buendía family and tells the story of the family's prosperity and decline, internal relationships and political struggles, self-mixing and rebirth over the course of a hundred years.
Project management is the process of applying specialized knowledge, skills, tools, and methods to project activities so that the project can achieve or exceed the set needs and expectations within the constraints of limited resources. This diagram provides a comprehensive overview of the 8 components of the project management process and can be used as a generic template for direct application.
One Hundred Years of Solitude is the masterpiece of Gabriel Garcia Marquez. Reading this book begins with making sense of the characters' relationships, which are centered on the Buendía family and tells the story of the family's prosperity and decline, internal relationships and political struggles, self-mixing and rebirth over the course of a hundred years.
One Hundred Years of Solitude is the masterpiece of Gabriel Garcia Marquez. Reading this book begins with making sense of the characters' relationships, which are centered on the Buendía family and tells the story of the family's prosperity and decline, internal relationships and political struggles, self-mixing and rebirth over the course of a hundred years.
Project management is the process of applying specialized knowledge, skills, tools, and methods to project activities so that the project can achieve or exceed the set needs and expectations within the constraints of limited resources. This diagram provides a comprehensive overview of the 8 components of the project management process and can be used as a generic template for direct application.
sort
basic concept
algorithm stability
If there are two elements Ri and Rj with the same keyword in the list to be sorted, and if the order (relative position) of the two elements does not change after sorting, then the sorting algorithm is called stable, otherwise it is unstable.
Judgment method: Whether there is long-distance data exchange
Classification
Internal sorting
Refers to a sort in which all elements are stored in memory during sorting
external sort
Refers to a sorting in which all elements cannot be stored in memory at the same time during sorting, and they are constantly moved between internal and external memory during the sorting process.
insertion sort
Basic idea
Each time a record to be sorted is inserted into the previously sorted subsequence according to its key size, until all records are inserted.
direct insertion sort
Ideas
From L[2]~L[n-1], each time the element to be sorted L[i] is inserted into the previously sorted sequence, and the elements larger than L[i] are moved back in order (compare the edges forward, move the element backward)
code
space efficiency
time efficiency
Best: the elements in the table are already ordered
Number of comparisons: n-1
Number of moves: 0
Worst: reverse order
Number of comparisons:
Number of moves:
average
characteristic
Stablize
Linear tables suitable for sequential storage and linked storage; when it is linked storage, the position of the specified element can be searched from front to back (in fact, most internal sorting algorithms are only suitable for sequential storage)
half insertion sort
Ideas
First fold in half to find the position where the element is to be inserted, and then move all elements after the element to be inserted uniformly.
code
time efficiency
Number of comparisons: (regardless of the initial state of the table)
Number of moves: depends on the initial state of the sorted table
characteristic
Stablize
Applies only to sequence tables
Hill sort (shrinking incremental sort)
Basic idea
Arrange the elements with an interval of d1 (step/increment) into a group, divide the sorting table into several sub-tables, perform direct insertion sorting inside each sub-table, then reduce the increment d2, and repeat the above process until di =1 until
code
Space efficiency:
Time efficiency:
characteristic
unstable
It is only applicable when the linear table is stored sequentially (because the random access characteristics of sequential storage are used)
swap sort
Basic idea
Swap the positions of the two records in the sequence based on the comparison results of the keywords of the two elements in the sequence.
Bubble Sort
Ideas
Compare the values of adjacent elements from back to front (or from front to back). If they are in reverse order, exchange them until the sequence is compared. The result of each round will be to exchange the smallest element to the first position of the sequence to be sorted (or to the largest element). The elements are swapped to the end of the column to be sorted), and this cycle can be done n-1 times
code
space efficiency
time efficiency
Best (no exchange occurs in the first sorting pass, jump out of the loop directly)
Number of comparisons: n-1
Number of moves: 0
Worst (in reverse order)
Number of comparisons
Number of moves
characteristic
Stablize
The subsequence generated in bubble sorting must be globally ordered (different from direct insertion), and all elements in it must be greater than or less than all elements in the unordered subsequence, and each sorting pass will put an element at its end position
Quick sort
Ideas
Based on the divide and conquer method, each time a certain element pivot is taken as the benchmark, the sequence is divided into two sub-tables, the left and right, and then recursively sorted within the sub-tables.
code
Space efficiency (recursive stack)
time efficiency
Worst (ordered or reversed)
Best (symmetrical, i.e. the sequence is divided into equal parts for each benchmark)
Features
unstable
The algorithm with the best average performance among internal sorting algorithms
No ordered subsequence is generated during the process, but each sorting operation will place the pivot (base) element at its final position.
external sort
concept
When there are too many records in the file and the entire file cannot be copied to the memory for sorting, a part of the data needs to be transferred into the memory for sorting each time. The sorting process involves multiple exchanges of internal and external memory.
External sorting method
step
①For k-way merge sort, k input buffers and 1 output buffer need to be allocated in memory
② Internally sort a total of N records (using an internal sorting algorithm), and generate r internally ordered initial merge segments (r=『N/L], L is the number of input buffers in memory)
③Merging S trips and k routes,
1) Read the blocks of k merged segments into k input buffers
2) Use the "merge sort" algorithm in turn to select the smallest record from the k merge segments and temporarily store it in the output buffer (k-1 comparisons are required each time)
3) When the output buffer is full, write out the external memory
4) If the i-th input buffer is empty, read the next disk block from the i-th merge segment
Time cost
Time to read and write external memory
Proportional to the number of merge passes S
Time required for internal sorting
Internal merge time
optimization
1) Increase the number of merging paths k, and only need multi-way balanced merging
Cost 1: Need to increase the corresponding input buffer
Cost 2: Each time selecting a minimum element from k merge segments requires (k-1) keyword comparisons (can be solved by loser tree)
2) Reduce the number of initial merge segments r
Multi-way balanced merging and loser tree
Basic idea
In order to prevent the internal merge from being affected by the increase of k, a loser tree is introduced. The k leaf nodes respectively store the records of the k merge segments currently participating in the comparison during the merge process. They are used internally to memorize the "failures" in the left and right subtrees. ", the root node points to the minimum number;
The initial number of comparisons is k-1. After that, only "log2k] times of comparison are needed to select a minimum value.
Number of comparisons
After the loser tree is introduced, the number of comparisons has nothing to do with k; but as k increases, the number of input buffers may be increased. If the total memory space remains unchanged, the capacity of each buffer needs to be reduced to reduce the number of internal and external memory exchanges. increase
Replacement-selection sort (generating initial merge segments)
Purpose
Increase the length of each merge segment (but not fixed length), thereby reducing the number of merge segments
step
Let the initial file to be sorted be FI, the initial merge segment output file be FO, and the memory work area be WA (can accommodate w records)
1) Input w records from FI to workspace WA.
2) Select the record with the minimum keyword value from WA and record it as the MINIMAX record.
3) Output MINIMAX records to FO.
4) If FI is not empty, input the next record from FI into WA.
5) Select the smallest keyword record from all records in WA with keywords greater than the MINIMAX record as the new MINIMAX record.
6) Repeat 3)~5) until no new MINIMAX record can be selected in WA, thus obtaining an initial merge segment and outputting an end flag of the merge segment to FO.
7) Repeat 2)~6) until WA is empty. From this, all initial merged segments are obtained.
best merge tree
Basic idea
1) Each initial merge segment corresponds to a leaf node, and the number of blocks contained in the merge segment is the leaf weight.
2) Merge tree WPL = the sum of the weighted path lengths of all leaf nodes in the tree
3) The number of disk I/Os during the merge process = WPL of the merged tree*2
How to construct the optimal merge tree
1) Supplement the virtual paragraph
①The optimal merge tree for k-ary merging must be a strict k-ary tree, that is, there are only nodes with degrees k and 0 in the tree.
②If (initial number of merged segments-1)%(k-1)=0, it means that a strict k-ary tree can be formed
③ Otherwise, several virtual segments (merged segments occupying 0 blocks) need to be added to make the above formula = 0
2) Construct a k-ary Huffman tree
Each time, k trees with the smallest root node weights are selected to merge, and the sum of the weights of these k root nodes is used as the weight of the new root node.
Comparison and application of various internal sorting algorithms
Comparison of internal sorting algorithms
Application of internal sorting algorithm
①If n is small, direct insertion sorting or simple selection sorting can be used. Since direct insertion sort requires more record moves than simple selection sort, when the record itself has a large amount of information, simple selection sort is better.
② If the initial state of the file is basically ordered by keywords, it is appropriate to use direct insertion or bubble sorting.
③If n is large, a sorting method with a time complexity of O(nlogzn) should be used: quick sort, heap sort or merge sort.
1) Quick sort is considered to be the best method among current comparison-based internal sorting methods. When the keywords to be sorted are randomly distributed, the average time of quick sort is the shortest.
2) Heap sort requires less auxiliary space than quick sort, and does not suffer from the worst-case scenario that quick sort may have, both sorts are unstable.
3) If the sorting is required to be stable and the time complexity is O(nlog2n), merge sorting can be used.
④ In the comparison-based sorting method, after each comparison of the sizes of two keywords, only two possible transfers occur. Therefore, a binary tree can be used to describe the comparison and determination process. It can be proved that: when n files When keywords are randomly distributed, any sorting algorithm that relies on "comparison" requires at least O(nlogzn) time.
⑤ If n is very large and the number of recorded keywords is small and can be decomposed, it is better to use radix sorting.
⑥When the record itself has a large amount of information, in order to avoid spending a lot of time moving the record, a linked list can be used as a storage structure.
Algorithm applicable storage structure
Sequential storage structure is suitable for random access and sequential access, while chain storage is only suitable for sequential access.
Only sequential access algorithms can use chain structures (such as direct insertion, bubbling), and other internal sorting algorithms can only use sequential storage.
Merge sort and radix sort
merge sort
Basic idea
2-way merge sort treats the list to be sorted as n sub-lists (each sub-list has a length of 1), and then merges them recursively until an ordered list of length n is synthesized.
code
Performance analysis
space efficiency
time efficiency
Stablize
Radix sort
Basic idea
Use the idea of multi-keyword sorting to deal with single logical keywords
Most significant bit first (MSD)
Divide it into several smaller subsequences layer by layer according to the decreasing weight of the key position.
Least Significant First (LSD)
Sort by increasing keyword weight
Sorting Process (LSD)
The base r=10 requires 10 chain queues; each keyword is 3 bits and requires three "allocation" and "collection" operations.
Performance analysis
space efficiency
time efficiency
d trips distribution and collection
One allocation requires O(n)
One collection requires O(r)
independent of the initial state of the sequence
Stablize
selection sort
Basic idea
Each pass (i) selects the element with the smallest keyword among the n-i 1 elements to be sorted, as the i-th element of the ordered subsequence, and loops n-1 times
Simple selection sort
Ideas
The i-th sorting selects the element with the smallest keyword from L[i...n] and exchanges it with L[i]. Each pass can determine the final position of an element.
code
space efficiency
time efficiency
Number of comparisons: n(n-1)/2
Number of moves
Best: 0
Worst: 3(n-1)
characteristic
unstable
Heap sort
Ideas
heap
Big root heap (big top heap): the node value is greater than the value of its left and right children
Small root heap: the node value is smaller than the value of its left and right children
Sorting Algorithm
① First build n elements into a large root heap, the top element of the heap is the maximum value, output it, and send the bottom element of the heap to the top of the heap
② Adjust the top element of the heap downwards to maintain the nature of a large root heap, then output the top element of the heap, and repeat this
sorting process
① For a complete binary tree with n nodes, start filtering from the parent node of the last node ("n/2]), and exchange the parent and child nodes to make it called a large root heap (the parent node may continue to move down to the bottom) , and then filter the subtrees rooted at each node ("n/2]~1)
②After outputting the top element of the heap, exchange the last top element of the heap with the top element of the heap, and then recursively exchange the order of the top element of the heap with the larger of the children.
code
Large root heap creation algorithm
Heap sort algorithm
insert operation
First place the new node at the end of the heap, then point it upward for the adjustment operation
time complexity
Performance analysis
space efficiency
time efficiency
Heap building time:
Adjustment operations:
characteristic
unstable
Suitable for sequential storage (because of random access)