Features
Features:

Product Tour >

Edraw AI >

Paid Plans:

Individuals >

Business >

Eduaction >
Resources
Blog

History

How-tos & Tips

Discovery

Biography

Business Analysis

Examples

AI concept Map

Free AI Mind Map Generator

Onenote Mind Map

Bcg Matrix Examples

Nike Marketing Strategy

Unilever SWOT Analysis

Make Mind Maps in Google Docs

Guide

FAQs

What's New

Resource Center
Templates
All Templates

Brain Storming Templates

Strategy and Planning Templates

Project Management Templates

Product Management Templates

Human Resources Templates

Agile Workflow Templates

Marketing Templates

Education Templates

Fun and Games Templates

User Gallery
Download
Pricing
Enterprise

MindMap Gallery Data Structure Chapter 8 - Sorting

Data Structure Chapter 8 - Sorting

Chapter 8 of "Data Structure" - Sorting Notes, including insertion sort, exchange sort, external sort, comparison and application of various internal sort algorithms, merge sort and radix sort, etc.

Edited at 2022-11-23 16:08:11

PlotWizard

Recent works View more works>>

Data Structure Chapter 8 - Sorting

PlotWizard

Recent works View more works>>

Recommended to you
Outline

Data structure basic algorithms
- 9
PlotWizard

sort

basic concept

algorithm stability

If there are two elements Ri and Rj with the same keyword in the list to be sorted, and if the order (relative position) of the two elements does not change after sorting, then the sorting algorithm is called stable, otherwise it is unstable.

Judgment method: Whether there is long-distance data exchange

Classification

Internal sorting

Refers to a sort in which all elements are stored in memory during sorting

external sort

Refers to a sorting in which all elements cannot be stored in memory at the same time during sorting, and they are constantly moved between internal and external memory during the sorting process.

insertion sort

Basic idea

Each time a record to be sorted is inserted into the previously sorted subsequence according to its key size, until all records are inserted.

direct insertion sort

Ideas

From L[2]~L[n-1], each time the element to be sorted L[i] is inserted into the previously sorted sequence, and the elements larger than L[i] are moved back in order (compare the edges forward, move the element backward)

code

space efficiency

time efficiency

Best: the elements in the table are already ordered

Number of comparisons: n-1

Number of moves: 0

Worst: reverse order

Number of comparisons:

Number of moves:

average

characteristic

Stablize

Linear tables suitable for sequential storage and linked storage; when it is linked storage, the position of the specified element can be searched from front to back (in fact, most internal sorting algorithms are only suitable for sequential storage)

half insertion sort

Ideas

First fold in half to find the position where the element is to be inserted, and then move all elements after the element to be inserted uniformly.

code

time efficiency

Number of comparisons: (regardless of the initial state of the table)

Number of moves: depends on the initial state of the sorted table

characteristic

Stablize

Applies only to sequence tables

Hill sort (shrinking incremental sort)

Basic idea

Arrange the elements with an interval of d1 (step/increment) into a group, divide the sorting table into several sub-tables, perform direct insertion sorting inside each sub-table, then reduce the increment d2, and repeat the above process until di =1 until

code

Space efficiency:

Time efficiency:

characteristic

unstable

It is only applicable when the linear table is stored sequentially (because the random access characteristics of sequential storage are used)

swap sort

Basic idea

Swap the positions of the two records in the sequence based on the comparison results of the keywords of the two elements in the sequence.

Bubble Sort

Ideas

Compare the values of adjacent elements from back to front (or from front to back). If they are in reverse order, exchange them until the sequence is compared. The result of each round will be to exchange the smallest element to the first position of the sequence to be sorted (or to the largest element). The elements are swapped to the end of the column to be sorted), and this cycle can be done n-1 times

code

space efficiency

time efficiency

Best (no exchange occurs in the first sorting pass, jump out of the loop directly)

Number of comparisons: n-1

Number of moves: 0

Worst (in reverse order)

Number of comparisons

Number of moves

characteristic

Stablize

The subsequence generated in bubble sorting must be globally ordered (different from direct insertion), and all elements in it must be greater than or less than all elements in the unordered subsequence, and each sorting pass will put an element at its end position

Quick sort

Ideas

Based on the divide and conquer method, each time a certain element pivot is taken as the benchmark, the sequence is divided into two sub-tables, the left and right, and then recursively sorted within the sub-tables.

code

Space efficiency (recursive stack)

time efficiency

Worst (ordered or reversed)

Best (symmetrical, i.e. the sequence is divided into equal parts for each benchmark)

Features

unstable

The algorithm with the best average performance among internal sorting algorithms

No ordered subsequence is generated during the process, but each sorting operation will place the pivot (base) element at its final position.

external sort

concept

When there are too many records in the file and the entire file cannot be copied to the memory for sorting, a part of the data needs to be transferred into the memory for sorting each time. The sorting process involves multiple exchanges of internal and external memory.

External sorting method

step

①For k-way merge sort, k input buffers and 1 output buffer need to be allocated in memory

② Internally sort a total of N records (using an internal sorting algorithm), and generate r internally ordered initial merge segments (r=『N/L], L is the number of input buffers in memory)

③Merging S trips and k routes,

1) Read the blocks of k merged segments into k input buffers

2) Use the "merge sort" algorithm in turn to select the smallest record from the k merge segments and temporarily store it in the output buffer (k-1 comparisons are required each time)

3) When the output buffer is full, write out the external memory

4) If the i-th input buffer is empty, read the next disk block from the i-th merge segment

Time cost

Time to read and write external memory

Proportional to the number of merge passes S

Time required for internal sorting

Internal merge time

optimization

1) Increase the number of merging paths k, and only need multi-way balanced merging

Cost 1: Need to increase the corresponding input buffer

Cost 2: Each time selecting a minimum element from k merge segments requires (k-1) keyword comparisons (can be solved by loser tree)

2) Reduce the number of initial merge segments r

Multi-way balanced merging and loser tree

Basic idea

In order to prevent the internal merge from being affected by the increase of k, a loser tree is introduced. The k leaf nodes respectively store the records of the k merge segments currently participating in the comparison during the merge process. They are used internally to memorize the "failures" in the left and right subtrees. ", the root node points to the minimum number;

The initial number of comparisons is k-1. After that, only "log2k] times of comparison are needed to select a minimum value.

Number of comparisons

After the loser tree is introduced, the number of comparisons has nothing to do with k; but as k increases, the number of input buffers may be increased. If the total memory space remains unchanged, the capacity of each buffer needs to be reduced to reduce the number of internal and external memory exchanges. increase

Replacement-selection sort (generating initial merge segments)

Purpose

Increase the length of each merge segment (but not fixed length), thereby reducing the number of merge segments

step

Let the initial file to be sorted be FI, the initial merge segment output file be FO, and the memory work area be WA (can accommodate w records)

1) Input w records from FI to workspace WA.

2) Select the record with the minimum keyword value from WA and record it as the MINIMAX record.

3) Output MINIMAX records to FO.

4) If FI is not empty, input the next record from FI into WA.

5) Select the smallest keyword record from all records in WA with keywords greater than the MINIMAX record as the new MINIMAX record.

6) Repeat 3)~5) until no new MINIMAX record can be selected in WA, thus obtaining an initial merge segment and outputting an end flag of the merge segment to FO.

7) Repeat 2)~6) until WA is empty. From this, all initial merged segments are obtained.

best merge tree

Basic idea

1) Each initial merge segment corresponds to a leaf node, and the number of blocks contained in the merge segment is the leaf weight.

2) Merge tree WPL = the sum of the weighted path lengths of all leaf nodes in the tree

3) The number of disk I/Os during the merge process = WPL of the merged tree*2

How to construct the optimal merge tree

1) Supplement the virtual paragraph

①The optimal merge tree for k-ary merging must be a strict k-ary tree, that is, there are only nodes with degrees k and 0 in the tree.

②If (initial number of merged segments-1)%(k-1)=0, it means that a strict k-ary tree can be formed

③ Otherwise, several virtual segments (merged segments occupying 0 blocks) need to be added to make the above formula = 0

2) Construct a k-ary Huffman tree

Each time, k trees with the smallest root node weights are selected to merge, and the sum of the weights of these k root nodes is used as the weight of the new root node.

Comparison and application of various internal sorting algorithms

Comparison of internal sorting algorithms

Application of internal sorting algorithm

①If n is small, direct insertion sorting or simple selection sorting can be used. Since direct insertion sort requires more record moves than simple selection sort, when the record itself has a large amount of information, simple selection sort is better.

② If the initial state of the file is basically ordered by keywords, it is appropriate to use direct insertion or bubble sorting.

③If n is large, a sorting method with a time complexity of O(nlogzn) should be used: quick sort, heap sort or merge sort.

1) Quick sort is considered to be the best method among current comparison-based internal sorting methods. When the keywords to be sorted are randomly distributed, the average time of quick sort is the shortest.

2) Heap sort requires less auxiliary space than quick sort, and does not suffer from the worst-case scenario that quick sort may have, both sorts are unstable.

3) If the sorting is required to be stable and the time complexity is O(nlog2n), merge sorting can be used.

④ In the comparison-based sorting method, after each comparison of the sizes of two keywords, only two possible transfers occur. Therefore, a binary tree can be used to describe the comparison and determination process. It can be proved that: when n files When keywords are randomly distributed, any sorting algorithm that relies on "comparison" requires at least O(nlogzn) time.

⑤ If n is very large and the number of recorded keywords is small and can be decomposed, it is better to use radix sorting.

⑥When the record itself has a large amount of information, in order to avoid spending a lot of time moving the record, a linked list can be used as a storage structure.

Algorithm applicable storage structure

Sequential storage structure is suitable for random access and sequential access, while chain storage is only suitable for sequential access.

Only sequential access algorithms can use chain structures (such as direct insertion, bubbling), and other internal sorting algorithms can only use sequential storage.

Merge sort and radix sort

merge sort

Basic idea

2-way merge sort treats the list to be sorted as n sub-lists (each sub-list has a length of 1), and then merges them recursively until an ordered list of length n is synthesized.

code

Performance analysis

space efficiency

time efficiency

Stablize

Radix sort

Basic idea

Use the idea of multi-keyword sorting to deal with single logical keywords

Most significant bit first (MSD)

Divide it into several smaller subsequences layer by layer according to the decreasing weight of the key position.

Least Significant First (LSD)

Sort by increasing keyword weight

Sorting Process (LSD)

The base r=10 requires 10 chain queues; each keyword is 3 bits and requires three "allocation" and "collection" operations.

Performance analysis

space efficiency

time efficiency

d trips distribution and collection

One allocation requires O(n)

One collection requires O(r)

independent of the initial state of the sequence

Stablize

selection sort

Basic idea

Each pass (i) selects the element with the smallest keyword among the n-i 1 elements to be sorted, as the i-th element of the ordered subsequence, and loops n-1 times

Simple selection sort

Ideas

The i-th sorting selects the element with the smallest keyword from L[i...n] and exchanges it with L[i]. Each pass can determine the final position of an element.

code

space efficiency

time efficiency

Number of comparisons: n(n-1)/2

Number of moves

Best: 0

Worst: 3(n-1)

characteristic

unstable

Heap sort

Ideas

heap

Big root heap (big top heap): the node value is greater than the value of its left and right children

Small root heap: the node value is smaller than the value of its left and right children

Sorting Algorithm

① First build n elements into a large root heap, the top element of the heap is the maximum value, output it, and send the bottom element of the heap to the top of the heap

② Adjust the top element of the heap downwards to maintain the nature of a large root heap, then output the top element of the heap, and repeat this

sorting process

① For a complete binary tree with n nodes, start filtering from the parent node of the last node ("n/2]), and exchange the parent and child nodes to make it called a large root heap (the parent node may continue to move down to the bottom) , and then filter the subtrees rooted at each node ("n/2]~1)

②After outputting the top element of the heap, exchange the last top element of the heap with the top element of the heap, and then recursively exchange the order of the top element of the heap with the larger of the children.

code

Large root heap creation algorithm

Heap sort algorithm

insert operation

First place the new node at the end of the heap, then point it upward for the adjustment operation

time complexity

Performance analysis

space efficiency

time efficiency

Heap building time:

Adjustment operations:

characteristic

unstable

Suitable for sequential storage (because of random access)