Features
Features:

Product Tour >

Edraw AI >

Paid Plans:

Individuals >

Business >

Eduaction >
Resources
Blog

History

How-tos & Tips

Discovery

Biography

Business Analysis

Examples

AI concept Map

Free AI Mind Map Generator

Onenote Mind Map

Bcg Matrix Examples

Nike Marketing Strategy

Unilever SWOT Analysis

Make Mind Maps in Google Docs

Guide

FAQs

What's New

Resource Center
Templates
All Templates

Brain Storming Templates

Strategy and Planning Templates

Project Management Templates

Product Management Templates

Human Resources Templates

Agile Workflow Templates

Marketing Templates

Education Templates

Fun and Games Templates

User Gallery
Download
Pricing
Enterprise

EdrawMind

Data structure mind map

MindMap Gallery Data structure mind map

Data structure mind map

Data structures are the way computers store and organize data. A data structure refers to a collection of data elements that have one or more specific relationships with each other. The figure below describes linear tables, stacks and queues, trees and binary trees, graphs, searches, etc. It is worth collecting and learning!

Edited at 2021-08-14 16:46:12

PlotWizard

Recent works View more works>>

Data structure mind map

PlotWizard

Recent works View more works>>

Recommended to you
Outline

data structure

Chapter One Introduction

1. Basic concepts of data structure

1. Basic concepts and terminology

1.Data

The carrier of information is a collection of numbers, characters and all symbols that can be recognized and processed by computers that describe the attributes of objective things.

2. Data elements

The basic unit of data, consisting of data items, the smallest indivisible unit

3. Data objects

A collection of data elements with the same properties, a subset of data

4.Data type

A collection of values and a set of operations defined on the collection.

Atomic type

Value cannot be divided

structure type

The value can be divided further

abstract data type

5. Abstract data types

1. Definition: A mathematical model and a set of operations defined on the model

2. Characteristics: The definition only depends on its set of logical characteristics and has nothing to do with how it is represented and implemented inside the computer.

3. Representation: (data objects, data relationships, basic operation sets)

6.Data structure

1. Structure: the relationship between data elements

2. Definition: A collection of data elements that have one or more specific relationships with each other.

3. Content: Logical structure, storage structure, data operations

Logical structure: the design of an algorithm

Storage structure: implementation of an algorithm

2. Three elements of data structure

1. Logical structure of data

1. Definition: The logical relationship between data elements, which has nothing to do with storage and is independent of the computer.

2. Classification

linear structure

linear table

One to one

nonlinear structure

gather

Belong to the same set

Tree

one to many

Figure/Network

many to many

2. Data storage structure

1. Definition: Representation of data structure in computer (also called image/physical structure)

2. Including: representation of data elements and representation of relationships

3. Classification

sequential storage

Advantages: Random access, elements occupy the least storage space

Disadvantages: Only an adjacent block of storage units can be used, resulting in more external fragmentation

chain storage

Advantages: No fragmentation

Disadvantages: Storage pointers occupy additional storage space; can only be accessed sequentially

Index storage

Definition: Create an additional index table, each item is called an index item (keyword, address)

Advantages: Fast retrieval

Disadvantages: takes up more storage space; adding and deleting data requires modifying the index table, which takes more time

Hash storage

Definition: Directly calculate the storage address of the element (Hash storage) based on the keyword of the element

Advantages: Retrieval, adding and deleting nodes are fast

Disadvantages: If the hash function is not good, element storage unit conflicts will occur, which will increase the cost of time and space.

3. Data operations

1. Definition of operation: Point out the function of operation according to the logical structure

2. Implementation of operation: Point out the specific operation steps of the operation according to the storage structure.

Question summary

1. Error-prone

1. Belongs to a logical structure

ordered list

2. A circular queue is a queue represented by a sequence table. It is a data structure, not an abstract data structure.

3. The storage spaces of different nodes can be discontinuous, but the storage spaces within the nodes must be continuous.

4. Two different data structures, the logical structure and the physical structure can be exactly the same, but the data operations are different.

Binary trees and binary sorting trees have different times for searching nodes.

2. Algorithms and algorithm evaluation

1. Basic concepts of algorithms

1. Definition: A description of the steps to solve a specific problem, a finite sequence of instructions

2.Characteristics

1. Finiteness

2.Certainty

3. Feasibility

4.Input

5.Output

3. Good algorithm

1. Correctness

2. Readability

3. Robustness

4. Efficiency and low storage requirements

2. Measurement of algorithm efficiency

1. Time complexity O(n)

Definition: Measures how quickly the execution time of an algorithm increases as the size of the problem increases.

2. Space complexity S(n)

1. Definition: Measures how quickly the space required by the algorithm increases as the size of the problem increases.

2. The algorithm works in place: the auxiliary space required by the algorithm is constant, that is, O(1)

Question summary

1. Error-prone

1. Merge two ascending linked lists of length m, n into a descending linked list of m n (take the smaller element, head interpolation method)

1. Best case: O(min(m,n))

The one that requires less

2. Worst case: O(max(m,n)) =O(m n)

Both linked lists have been traversed once

2. Time complexity calculation

1.sum = i;

O(n^1/2)

kth time: sum = 1 2 3 ... k ≥n

2.i=i*2

O(log₂n)

3. Recursive algorithm

1.T(n)=2T(n/2) n (n>1) (when n=1: T(n)=1)

O(nlog₂n)

2. Assume n=2^k, first find the general formula of T(2^k) according to the recursion condition, and then convert it into n, that is, get the time complexity

3. For the same algorithm, the higher the level of the implemented language, the lower the execution efficiency.

Chapter 2 Linear Table

1. Definition and basic operations of linear tables

1.Definition of linear table

1.Definition: Finite sequence of the same data type

2. Note: Linear lists are logical structures, sequential lists and linked lists refer to storage structures.

2. Basic operations of linear tables

1. Note: & represents a reference in C. If a pointer variable is passed in and needs to be changed in the function body, a reference to the pointer variable must be used (pointers to pointers can also be used in C)

2. Main operations

2. Sequential representation of linear tables

1. Definition of sequence table

1. The sequence table requires three parts

The starting position of the storage space

Maximum storage space of sequence table

The current length of the sequence table

2. Static allocation

3. Dynamic allocation

4. Dynamic allocation statement

C language: L.data=(ElemType*)malloc(sizeof(ElemType)*InitSize);

C: L.data=new ElemType[InitSize];

5. Note: Dynamic allocation is not chain storage, it is also a sequential storage structure, and the physical structure does not change: it is a random access method, but the allocated space size can be determined at runtime.

6. Features: Random access, high storage density, insertion and deletion require moving a large number of elements

2. Implementation of basic operations on the sequence table

1. Insertion operation O(n)

2. Delete operation O(n)

3. Search by value (sequential search) O(n)

Question summary

1. Error-prone

1. The linear table must be a finite sequence

2. Insert a number into the i-th position of the sequence table. The legal value of i is: 1≤i≤n 1

Inserting at the n 1th position is equivalent to appending to the end of the table

2. Algorithm

1. Reverse all elements of the sequence list

1. Scan the first half, and exchange the i-th element (0≤i<n/2) with the corresponding element (n-i-1) in the second half

Space O(1)

1.1 In the array A[m n], there are two sequence tables with lengths m and n, and the positions are interchanged.

1. Thoughts

1. First reverse the order of all elements in A (n,n-1,...,m,m-1,...,1)

2. Then use the reverse order algorithm for the first n elements and the last m elements respectively.

2. Realize

1. Set the function Reverse() to implement the reverse order from any left to right position

2. Set the function Exchange() to realize the position exchange of two sequence tables with lengths of m and n.

3.Reverse() has three parameters. The first one is the array that needs to be reversed. Following the above method, Swap the left element with its corresponding (right-left-1) element

4.Exchange() has three parameters, the first one is array A, and the remaining lengths are m and n.

5.Exchange() calls Reverse(), passing parameters (A,0,m n-1) for the first time The second time (A,0,n-1), the third time (A,n,m n-1)

3. Similar descriptions

1. Move the elements in A to the left p positions

1. Treat the first p elements as array B

2. Treat the last n-p elements as array C

3. Complete the exchange of positions B and C

Time O(n), space O(1)

2. Sequential list (non-ordered) deletes all data elements with value x

1. Use k to record the number of elements that are not equal to x (initial value = 0), which is the current position that should be inserted.

2. Scan the sequence table. If it is not x, put this element into the k-th position, and k will increase by 1.

3. The length of the last modified table = k

Time O(n), space O(1)

2.1 Sequential list (non-ordered) deletes all elements with values between s and t

1. Just change k to record the number of elements less than s or greater than t.

3. The ordered sequence list deletes all elements between the given values s and t

1. First find the first element ≥ s (the first deleted element)

2. Find the first element > t (the next element after the last deleted element)

3. Directly move the following elements forward, and finally modify the table length

4. Remove all elements with duplicate values from the ordered list

1. The idea is similar to direct insertion sorting. Initially, the first element is regarded as a non-repeating ordered list.

2. Determine in turn whether the following elements are the same as the last element of the previous non-repeating ordered list

3. Two pointers: i always points to the last element of the ordered list, j is the working pointer

4.1 Change ordered list to unordered list

1. Using a hash table, the time is O(n)

5. Merge two ordered lists into one ordered list, and return the result by the function

1. The function has three parameters, the last one is the result linked list, reference type

2. Compare the header elements of the two linked lists in order, and put the smaller one into the new list.

3. Check which table has remaining, and add the remaining part to the end of the new table

6. An ascending sequence S of length L, the median is ┍L/2┑ Find the median of two equal-length ascending sequences A and B

1. First find the medians a and b of A and B respectively, and compare them.

2. If a=b, then a is the required median

3. If a<b, discard the smaller half of A and discard the larger half of B. The lengths of the two discards are required to be equal.

4. If a>b, discard the larger half of A and discard the smaller half of B. It is required that the lengths of the two discards are equal.

5. Repeat steps 2, 3, and 4 until both sequences contain only one element, and the smaller one is the required median.

Note: To ensure that the discarded lengths are equal each time

1. When the sequence is an odd number, the middle point is retained

2. When the sequence is an even number, because the position of the middle point is relatively first, Therefore, when discarding smaller parts, you should discard the middle points together. To ensure that the discarded length is equal to the larger part

3. Both cases 3 and 4 need to be discussed separately.

Time O(log₂n), space O(1)

7. If the number of elements with value x > n/2, it is called the main element. Find the main element of A.

1. Select candidate main elements

1. Put the first number a into c as a candidate main element, and record the number of occurrences of a as 1

2. If the next element = a, count 1, otherwise count -1

3. When the count decreases to 0, replace the candidate main element, put the next number into c, and re-count = 1

4. Repeat the process until the end

2. Determine whether the element in c is the real main element

1. Scan again to count the number of occurrences of c. If >n/2, it is the main element.

Time O(n), space O(1)

Note: If you use counting sorting: time O(n), space O(n)

8. Find the smallest positive integer that does not appear in the array, less time

1. Exchange space for time, allocate an array B[n] for marking, record whether a positive integer from 1-n appears, and the initial value is all 0

2. No action is taken for numbers less than or equal to 0, or greater than n.

3. If 0<A[i]≤n, let B[A[i]-1]=1, otherwise no operation will be performed

4. Traverse B, find the first B[i]==0, and return i 1. If all are not 0, return i 1 (i=n when jumping out of the loop)

3. Linked representation of linear table

1. Definition of singly linked list

1.Description of node types

2. Features: Waste of space, non-random access

3. The difference between head pointer and head node

1. Regardless of whether there is a head node or not, the head pointer always points to the first node of the linked list.

2. The head node is the first node in the linked list with the head node and usually does not store information.

4. Advantages of introducing head nodes

1. The operation at the first position of the linked list is consistent with the operation at other positions.

2. Regardless of whether the linked list is empty or not, the head pointer points to the non-null pointer of the head node, and empty lists and non-empty lists are treated the same

2. Implementation of basic operations on singly linked lists

1. Head insertion method to create a linked list

2. Create a linked list using tail insertion method

3. Find node value by serial number

4. Find table nodes by value

5. Insert node operation

Forward insertion operation: Find the previous node first, the time complexity is O(n)

Extension: Convert the forward insert operation into a backward insert operation, and then exchange the data of the two nodes. The time complexity is O(1)

6. Delete node operation

Find the predecessor node first, then delete the node, O(n)

Extension: first exchange data with the successor node, and then delete the successor node, O(1)

7. Find the table length operation

Count the number of data nodes

3. Double linked list

1. Description of node types in doubly linked lists

2. Insert operation

3. Delete operation

4. Circular linked list

1. Circular singly linked list

1. Null condition: Whether the pointer of the head node points to the head pointer

2. When operating on the head and tail of a singly linked list: do not set the head pointer and only set the tail pointer, which is more efficient.

3. You can traverse the entire linked list starting from any node

2. Circular doubly linked list

Null condition: The prior field and next field of the head node both point to the head pointer.

5. Static linked list

1. Definition: Use an array to describe a chained storage structure, which also has a data field and a pointer field. The pointer is the relative address of the node (array subscript), also called a cursor

2.Form

3.End flag: next==-1

4.Attention

1. Like the sequence table, a continuous memory space must be allocated in advance

2. Insertion and deletion only require modifying the pointer and do not require moving elements.

3. Very clever in high-level languages (Basic) that do not support pointers

6. Comparison between sequence list and linked list

1.Access method

The sequence list can be accessed sequentially or randomly. The linked list can only be accessed sequentially from the head.

2. Logical structure and physical structure

3. Find, insert and delete

1. Search by value

When out of order, it is O(n)

When ordered, the sequence table can be searched in half, O(log₂n)

2. Search by serial number

The sequence list is O(1), the linked list is O(n)

4. Space allocation

5. How to choose storage structure

1. Storage considerations

Linked lists are used when it is difficult to estimate the length and storage size, but the storage density of linked lists is low.

2. Considerations based on operations

Frequently access data elements by serial number using a sequence table

3. Environmental considerations

Sequential lists are easy to implement, linked lists are based on pointers

If it is more stable, choose the sequence list; if it is more dynamic, choose the linked list.

Question summary

1. Error-prone

1. The chain storage structure is more convenient to represent various logical structures than the sequential storage structure.

2. Sequential storage can not only be used to store linear structures, but also the structures of graphs and trees (full binary trees)

3. Static linked lists need to allocate a large continuous space, and insertion and deletion do not require moving elements.

4. If a singly linked list is used to represent the queue, a circular linked list with a tail pointer should be used.

5. Given a one-dimensional array, the minimum time to build an ordered singly linked list is O(nlog₂n)

First sort the array and then construct a singly linked list

6. Commonly used operations on linked lists are inserting after the last element and deleting the first element. The most time-saving operation is

1. Single circular linked list without head node and tail pointer

2. Even if it is a double linked list, as long as there is no tail pointer, the tail node must be found, and the time is O(n)

be careful

1. When inserting a node: first operate on the successor node of the inserted node, and then operate on the original node

Chapter 3 Stack and Queue

1.Stack

1.Basic concept of stack

1. Definition of stack: a linear list with insertion and deletion at one end

2.Characteristics of stack: last in, first out

3. Basic operations of the stack: No restrictions, can be used directly

2. The sequential storage structure of the stack

1. Implementation of sequential stack

1. Stack top pointer: S.top Stack top element: S.data[S.top]

2. Push to the stack: The pointer is first incremented by 1, and then the value is sent to the top element of the stack.

3. Pop the stack: first take the value of the element on top of the stack, and then decrement the pointer on the top of the stack by 1

4. Conditions for empty judgment and full judgment: vary due to different actual conditions.

2. Basic operations of sequential stack

3. Shared stack

1. Definition: Set the bottoms of the two stacks at both ends of the shared space, and the tops of the two stacks extend toward the middle.

2. Blank judgment: top0=-1 top1=MaxSize

3. Full sentence: top1-top0=1

This formula is valid only under the above conditions for null judgment. If the conditions are different, the formula will also be different.

4. Push into the stack: top0 first adds 1 and then assigns a value, top1 first decrements 1 and then assigns a value, and the opposite is true when popping out of the stack.

Note: The top pointer of the stack points to the different operations of the top element of the stack and the element next to the top element of the stack (S.top=0)

1. Top element of stack

Push into stack:S.data[S.top]=x Pop from stack:x=S.data[S.top--]

2. The next element of the top element on the stack

Push into stack:S.data[S.top]=x Pop from stack:x=S.data[--S.top]

3. The judgment conditions for stack empty and stack full will also change.

3. Stack chain storage structure

1. Advantages: It facilitates multiple stacks to share storage space, improves its efficiency, and prevents stack overflow.

2. Features: All operations are performed at the head of the table. There is usually no head node. The head pointer is used as the top pointer of the stack to facilitate node insertion/deletion.

4. Stack pop sequence

1. After each element in the pop sequence, all elements smaller than it form a decreasing series.

For increasing sequences

2.f(n)=

f(2)=2, f(3)=5, f(4)=14

Question summary

1. Error-prone

1. The stack and queue have the same logical structure, not a storage structure

Linear lists, stacks, and queues are all linear structures.

2. The linked list does not have a head node and all operations are performed at the head of the list. It is the most unsuitable link stack.

1. A one-way circular linked list with only head node pointers and no tail node pointers

2. Reason: After inserting the node, it still needs to be turned into a circular singly linked list, and the tail node needs to be found, which takes O(n) time.

3.Solution

1. If the head node is used: the top of the stack takes the first node, and the stack pointer takes the head node

2. If there is no leading node: the top of the stack takes the second node, and the stack pointer takes the first node.

3. Both keep the first node unchanged and do not need to find the tail element of the table.

3. Insert an x node into a chain stack (without the head node) whose top pointer is top

x->next=top; top=x

4. The push sequence is 1,2,...,n, and the pop sequence is P1, P2,...,Pn. If P2=3, the number of possible values of P3 is n-1.

1. Obviously, 4, 5,..., n after 3 can be obtained

2.P1 may be 1, and P3 may be 2.

3.P1 may be 2, and P3 may be 1

4.P1 may be 4, and P3 may be any number except 1, 3, and 4.

5. When rewriting a recursive program in a non-recursive way, the stack may not be applicable

1. Calculating the Fibonacci sequence iteration only requires one loop

6. The definitions of the pointers at the top of the stack, the head of the queue, and the tail of the queue are not unique. Be sure to review the question carefully.

2. Algorithm

1. Determine whether a linked list is centrally symmetrical

1. Push the first half of the elements onto the stack in sequence. When processing the second half, when accessing a linked list element, pop an element out of the stack to determine whether they are equal.

2. Note that there is no abnormality when the number is even. When the number is odd, the center node should be skipped first.

3. There is no need for a real stack, just use the array as the stack, and use the working pointer i as the top pointer of the stack.

2.Queue

1. Basic concept of queue

1. Definition: A linear table that only allows insertion at one end of the table and deletion at the other end.

2.Basic operations

2. Queue sequential storage structure

1. Queue sequential storage

1. Two pointers: front points to the head element of the queue, and rear points to the next position of the tail element of the queue.

2. Initial state (team empty): Q.front==Q.rear==0

3. Enter the queue: first send the value to the last element of the queue, and then add 1 to the last pointer of the queue.

4. Dequeue: First take the value of the head element, and then add 1 to the head pointer

5. False overflow exists

6.Type description

2. Circular queue

1. Initial state (team empty): Q.front==Q.rear==0

2. Advance the front pointer: Q.front=(Q.front 1)%MaxSize

3. Advance the queue tail pointer: Q.rear=(Q.rear 1)%MaxSize

4. Queue length: (Q.rear MaxSize-Q.front)%MaxSize

5. Distinguish between empty and full teams

1. Sacrifice one unit and join the team to use one less unit (commonly used)

Full queue: (Q.rear 1)%MaxSize==Q.front

Empty team: Q.front==Q.rear

Number of elements in the queue: (Q.rear MaxSize-Q.front)%MaxSize

2. Add a data member indicating the number of elements: Q.size

3. Add tag data members: tag=0 deletion operation: the queue is empty, tag=1 insertion operation: the queue is full

3. Operation of circular queue

3. Queue chain storage structure

1. Queue chain storage

1.Type description

2. Single linked list with head node: unified insertion and deletion operations

3. Suitable for situations where data elements change greatly. There is no queue overflow or unreasonable storage allocation in multiple queues.

2. Basic operations of chain queue

4. Double-ended queue

1. Definition: A queue that allows both ends (front-end and back-end) to enter and exit the queue.

2. Logical structure: linear structure

3. Input restricted deque

4. Output restricted deque

Question summary

1. Error-prone

1. When using a chained storage queue to perform a deletion operation, both the head and tail pointers may be modified.

Generally, only the head pointer of the queue needs to be modified. If there is only one element, the tail pointer needs to be modified.

2. In the chain queue, when the element pointed to by x enters the queue, perform the operation

1.rear->next=x,x->next=null,rear=x;

2. Because x is inserted at the end of the queue, it must be set to empty (middle sentence), which is more strict

2. Question type

1. Determine the initial and full status of the circular queue

1. Array A[0...n-1], front points to the head of the team, rear points to the tail of the team The first one is stored in A[0], asking about the initial situation

1. When entering the queue, front does not move and rear 1 eventually points to A[0]. So initial front=0, rear=n-1

Note: When dealing with specific problems, you can use some special situations (draw simple sketches) to judge the empty and full situations, which is faster than thinking directly.

2. Determine the sequence that cannot be obtained by the input/output restricted double-ended queue

1. Enter directly according to the 4 options one by one, using the elimination method

2. General error situation: The last queued number cannot be sandwiched between two numbers that are advanced than it (2,3,1)

3. Application of stack and queue

1. Application of stack in bracket matching

1. Thoughts

1. Set up an empty stack and read parentheses sequentially

2. If it is ), it will be popped out of the stack if paired with the top of the stack (or it is illegal.

3. If it is ( , push it onto the stack as a new more urgent expectation

4. The algorithm ends and the stack is empty, otherwise the bracket sequence does not match

2. Application of stack in expression evaluation

1. Postfix expression: The operator is after the operand, without parentheses

2. The process of converting infix to suffix

1. Add parentheses to all operators and their operands according to operator precedence (no need to add parentheses if they exist originally)

2. Move the operator after the corresponding parentheses

3. Remove the brackets

Note: The prefix expression puts the operator in front of the parentheses, and the others are the same.

3. Infix to suffix algorithm idea

1. Numbers: directly added to the suffix expression

2. If it is (: push to stack

3. If it is ): Add the operators in the stack to the postfix expression in sequence until ( and delete the ( in the stack

4.If - */

1. The stack is empty and pushed into the stack

2. The top of the stack is ( , pushed onto the stack

3. Higher than the priority of the element on the top of the stack, push it onto the stack [if it is the same level, execute 4 as well]

4. Otherwise [lower than or equal to the priority of the top of the stack], pop the operators in sequence until the operator lower than it or ( or the stack is empty ), push it onto the stack

5. The traversal is completed. If the stack is not empty, pop up all elements in sequence.

4. Suffix calculation expression process

1. Scan each term of the expression sequentially

2. Operand: Push onto the stack

3. Operator <op>: Continuously exit two operands x and y from the stack to form an operation instruction x<op>y, and push the calculation result back onto the stack

3. Application of stack in recursion

1. Two conditions

1. Recursive expression (recursive body)

2. Boundary conditions (recursive exit)

2. The essence of recursion: Can the original problem be transformed into a smaller problem with the same properties but smaller scale?

3. Disadvantages: There are many recursions, which can easily cause stack overflow; it contains a lot of repeated calculations and is not efficient.

4. Advantages: The code is simple and easy to understand

5. Conversion: Needs to be achieved with the help of stack

4. Application of queue in hierarchical traversal

5. Application of queues in computer systems

1. Solve the problem of speed mismatch between the host and external devices

1. Set up a print data buffer, pause output when full, do other things, and wait for the printer to send a request

2. Solve resource competition problems caused by multiple users

1. Arrange each request in a queue according to the time order, and allocate the CPU to the first user in the queue each time.

Question summary

subtopic

4. Compressed storage of special matrices

1. Definition of array (logical structure)

2. Storage structure of array (sequential storage)

1. Two mapping methods: row-first, column-first

3. Compressed storage of matrices

1. Compressed storage: Multiple elements with the same value only allocate one space, 0 does not allocate space

2. Special matrices: many identical matrix elements or zero elements, distributed regularly

1. Symmetric matrix

2.Triangular matrix

1. Lower triangular matrix

2. Upper triangular matrix

3. Tridiagonal matrix

4. Sparse matrix

1. Storage: Configure non-zero elements and corresponding rows and columns into a triplet (row label, column label, value)

2. Disadvantages: Loss of random access characteristics

Chapter 4 Trees and Binary Trees

1.Basic concept of tree

1. Definition of tree

2.Basic terminology

1. Degree of node

The number of child nodes of a node

2. Degree of tree

The maximum degree of a node in a tree

3.Depth of node

Starting from the root node and accumulating layer by layer from top to bottom.

4.Height of node

Starting from leaf nodes and accumulating layer by layer from bottom to top.

5. Tree height (depth)

The maximum number of levels of nodes in the tree

6. The path between two nodes

The sequence of nodes passing between two nodes

7. Path length

The number of edges passed on the path

8.Attention

The branches in the tree are directed (parents point to children), the path is from top to bottom, and there is no path between two children.

3. Nature of trees

1. Number of nodes = degree of all nodes 1

2. The i-th level of a tree with degree m has at most m^(i-1) nodes.

3. An m-ary tree with height h has at most (m^h-1)/(m-1) nodes.

Sum of geometric series

2. Concept of binary tree

1. The definition of binary tree and its main characteristics

1. Definition of binary tree

1. Subtrees can be divided into left and right, and the order cannot be reversed arbitrarily.

2. The difference between a binary tree and an ordered tree of degree 2

1. A tree with degree 2 has at least 3 nodes, and a binary tree can be empty.

2. The left and right order of the children of an ordered tree with degree 2 is relative to another child. A child does not need to distinguish between left and right. The left and right order of a binary tree is determined

2. Several special binary trees

1. Full binary tree

Each level in the tree contains the most nodes

2. Complete binary tree

1. Definition: Each node corresponds to a full binary tree, not necessarily the most at each level.

2. Nature

1. Leaf nodes are only on the two largest layers

2. If there is only one node with degree 1, the node has only left child

3. Binary sorting tree

1. The keys of all nodes on the left subtree are less than the root node

2. The keywords of all nodes on the right subtree are greater than the root node

3. The left and right subtrees are each a binary sorting tree.

4.Binary balanced tree

The depth difference between the left and right subtrees of any node does not exceed 1

3. Properties of binary trees

1.n0=n₂ 1

Number of leaf nodes = number of nodes with degree 2 1 (n=n0 n₁ n₂=n₁ 2n₂ 1)

2. The nth layer has at most 2^(n-1) nodes

3. A binary tree with height h has at most 2^h-1 nodes.

4. For complete binary trees

1. The parent node i/2 of node i (take the bound)

2. The left child of node i is 2i, and the right child is 2i 1

3. The level where node i is located is ㏒₂i (take the bound) 1

5.The height of a complete binary tree with n nodes is ㏒₂n (taken off the bound) 1

2. Storage structure of binary tree

1.Sequential storage

1. Suitable for complete binary trees and full binary trees

2. Generally, some empty nodes that do not exist are added to the binary tree.

3. Note: Only by starting storage from array subscript 1 can the above properties be satisfied.

It’s easy to ignore when writing a program

4. Distinguish between the sequential storage of trees and binary trees

1. In the tree: the array subscript is the node number, and the content above the subscript indicates the relationship between the nodes.

2. In a binary tree: the subscript represents both the node number and the relationship between nodes.

2. Chain storage

1. Binary linked list has three fields: data, lchild, rchild

2. A binary linked list with n nodes has n 1 empty link fields (the root node does not use a pointer) to form a clue linked list.

3. Binary tree traversal and clue binary tree

1. Binary tree traversal

1. Preorder traversal

2. In-order traversal

3. Post-order traversal

Order refers to when the root node is visited

4. Conversion of recursive algorithms and non-recursive algorithms (middle order)

1. Thoughts

1. First scan all the left nodes of the (unvisited) root node and push them onto the stack one by one.

2. Pop a node from the stack (no left child or already visited) and access it

3. Scan its right child and push it onto the stack

4. Then scan all the left nodes of the right child and push them onto the stack one by one.

5. Continue like this until the stack is empty

2. Algorithm implementation

3. The execution efficiency of non-recursive algorithms is higher than that of recursive algorithms

5. Level traversal (queue)

1. Thoughts

1. First put the root node into the queue, then dequeue it and access the node.

2. If there is a left subtree, add the root node of the left subtree to the queue

3. If there is a right subtree, add the root node of the right subtree to the queue.

4. Then dequeue and visit the dequeue node

5. Repeat until the queue is empty

2. Algorithm implementation

6. Construct a binary tree from the traversal sequence

1. Pre-order and mid-order

1. In preorder: the first one is the root node

2. In-order: The root node is divided into two subsequences, the front left subtree and the back right subtree.

3. In preorder: Find two subsequences, and the first node of each is also the root node.

2. Post-order and mid-order

The last node in postorder is equivalent to the first node in preorder

3. First order and last order are not allowed

2. Clue binary tree

1. Basic concepts

1.Purpose: To speed up the search for node predecessors and successors

2. Regulations

1. If there is no left subtree, let lchild point to the predecessor node; if there is no right subtree, let rchild point to the successor node.

The predecessor and successor are determined by the specific traversal method.

2. Add two flag fields to indicate whether the current pointer refers to the predecessor (1) or the left child (0)

3.Clue: Pointers to predecessors and successors

4. Threading: The process of traversing a binary tree in a certain order to turn it into a threaded binary tree.

2.Construction

1. The essence of clueing

1. Traverse the binary tree once and check whether the left and right pointer fields of the node are empty. If they are empty, change them to point to the predecessor and successor clues.

2. In the in-order traversal sequence: the first node is the leftmost node, and the last node is the rightmost node.

3. Precursor node

1. The left pointer is the clue, and the node pointed to is the predecessor node.

2. The left pointer is the left child, and the rightmost node of its left subtree is the predecessor node.

4. Successor node

1. The right pointer is the clue, and the node pointed to is the successor node.

2. The right pointer is the right child, and the leftmost node of its right subtree is the successor node.

2. Implementation of threaded in-order traversal

3. Sometimes a head node is also added to the clue linked list to form a two-way clue linked list.

1.lchild points to the root node, rchild points to the last node of in-order traversal

2. In-order traversal, the first node lchild and the last node rchild point to the head node

3.Traverse

1. A non-recursive algorithm for binary tree traversal can be implemented using clue binary trees.

1. First node Firstnode (leftmost node) in mid-order

2. The successor node Nextnode in the middle order

2. Algorithm implementation

4.Tree, forest

1. Tree storage structure

1. Expression of parents

1. Definition: Continuous space storage, each node adds a pseudo pointer to indicate the position of the parents in the array, the root node subscript is 0, and its pseudo pointer is -1

2. Features: Parents can be obtained quickly, but the child must traverse the entire structure.

2. Child representation

1. Definition: Connect the children of each node into a linear structure using a single linked list

2. Features: It is convenient to ask for children, but inconvenient to ask for parents.

3. Notation of children’s brothers (left child, right brother)

1. Definition: The left pointer points to the first child, the right pointer points to the first brother, and the binary linked list is used as a storage structure

2. Advantages: Convenient to convert tree into binary tree, easy to find children

3. Disadvantages: It is troublesome to find parents. It will be convenient if you add parent to point to parents.

2. Conversion of trees, forests and binary trees

1. Convert the tree to a binary tree

The left pointer points to the first child, the right pointer points to the first brother, the root has no brothers, and the binary tree has no right subtree.

2. Convert forest to binary tree

The root of each binary tree serves as the right subtree of the previous binary tree.

3. Convert binary tree to forest (unique)

1. The root and left subtree of the binary tree are used as the binary tree form of the first tree, and then converted into a tree (the right child becomes a brother)

2. The right subtree of the root and its left child serve as the second tree, and the right child serves as the third tree, and so on.

3. Traversal of trees and forests

1. First root traversal of the tree

Visit the root first, and then traverse each subtree from left to right, which is the same as the root-first traversal of the corresponding binary tree.

2. Back root traversal of the tree

Traverse each subtree from left to right and then visit the root, which is the same as the middle root traversal of the corresponding binary tree.

3. Pre-order traversal of the forest

Same as root-first traversal of a binary tree

4. In-order traversal of the forest

Same as root traversal in binary tree

4. Application of trees - union search

1.3 operations

1.Union(S,Root1,Root2): Merge two sub-collections and connect root Root2 to root Root1

S[Root2]=Root1

2.Find(S,x): Find the root of the tree containing x until a negative number is found

while(s[x]>=0) x=S[x]; return x;

3.Inital(s): Each element in the set S is initialized as a sub-set with only a single element (all elements are assigned a value of -1)

2. Storage structure: Parent representation of the tree, the subscript of the root node is the sub-set name, the parent node of the root node is a negative number, and the size is the number of sub-set nodes

5. Application of trees and binary trees

1. Binary sort tree (binary search tree BST)

1.Definition

1. The keys of all nodes on the left subtree are less than the root node

2. The keywords of all nodes on the right subtree are greater than the root node

3. The left and right subtrees are each a binary sorting tree.

Inorder traversal can obtain increasing ordered sequence

2. Find

1. Non-recursive implementation, recursion is simple but inefficient

3.Insert

1. The new node inserted must be a leaf node (it will only be inserted when the tree is empty)

4.Construction

1. Even if the inserted elements are the same but in different orders, the constructed BST will be different.

5.Delete

1. Leaf node: delete directly

2.i has only one left/right subtree: let i's subtree become the subtree of i's parent node, replacing i's position

3.i has two subtrees on the left and right: replace i with the direct successor/predecessor of i, then delete the direct successor/predecessor, and convert to case 1,2

6. Search efficiency analysis

1. Average search length ASL = (number of each layer * number of corresponding layers) / total number

2. Worst case: similar to ordered singly linked list O(n)

3. Best case: balanced binary tree O(㏒₂n)

4. Search process: Similar to binary search, but the decision tree of binary search is unique

2. Balanced binary tree (AVL tree)

1.Definition

1. Balanced binary tree: The absolute value of the height difference between the left and right subtrees of any node does not exceed 1

2. Balance factor: The height difference between the left and right subtrees of the node -1,0,1

2. Insert (insert first and then adjust)

1. The object of each adjustment is the minimum unbalanced subtree

2.LL balanced rotation (right single rotation)

0.A is the root node of the minimum unbalanced subtree. Be sure to find A clearly.

1. Reason: A new node (can be left or right) is inserted into the left subtree L of B, the left child of A, and the subtree with A as the root is out of balance.

2.Method

1. Rotate A’s left child B upward to the right and replace A as the root node.

2.A rotates downward to the right and becomes the root node of B’s right subtree.

3. The original right subtree of B is used as the left subtree of A

3.RR balanced rotation (left single rotation)

4.LR balanced rotation (double rotation first left and then right)

1. Reason: A new node (can be left or right) is inserted into the right subtree c of A's left child B, and the subtree with A as the root is out of balance.

2.Method

1. Rotate the root node c of the right subtree of A's left child B to the left and up to the position of the B node.

2. Then rotate the c node upward to the right and raise it to the position of the A node.

5.RL balanced rotation (right and then left double rotation)

3. Find

1. The maximum depth of a balanced binary tree containing n nodes is O(㏒₂n), and the balanced search length is O(㏒₂n)

3. Huffman tree and Huffman coding

1.Definition

1. The weighted path length (WPL) of the node

The product of the path length (number of edges) from the root node to any node and the weight of the node

2. Weighted path length of the tree

The sum of the weighted path lengths of all leaf nodes

3. Huffman tree (optimal binary tree)

Binary tree with minimum weighted path length

2.Construction

1.Construction process

1. Select the two nodes with the smallest weights to construct a new node. The weight is the sum of the two nodes.

2. Then select the remaining node with the smallest weight, construct it with the new node, and repeat the process

2.Features

1. The initial nodes are all leaf nodes. The smaller the weight, the greater the path length to the root node.

2. Create n-1 new nodes, a total of 2n-1 nodes

3. There is no node with degree 1

3. Huffman coding

1. Fixed length encoding

Use a binary representation of the same length for each character

2. Variable length encoding

Use binary bits of unequal length to represent different characters

3. Prefix encoding

No encoding is a prefix of another encoding

4.Huffman coding process

The sequence marked on the path from the root node to the character, 0 goes to the left child, 1 goes to the right child

Chapter 5 Picture

1. Basic concepts of graphs

1. Definition of graph

0.G=(V,E),V(G): Finite non-empty set of vertices E(G): Edge set |V|: Number of vertices (order of graph) |E|: Number of edges

0.1 Note: The linear table can be an empty table, and the tree can be an empty tree, but the graph cannot be an empty graph: the vertex set must be non-empty, and the edge set can be empty.

1. Directed graph

<v,w>: arc from v to w (v is adjacent to w/w is adjacent to v) v: arc tail w: arc head

2. Undirected graph

(v,w)=(w,v)

3. Simple diagram

There are no duplicate edges, and there are no edges from the vertex to itself.

4.Multiple graphs

There are duplicate edges and there are edges from the vertex to itself.

5. Complete graph (simple complete graph)

1. Undirected complete graph: There are n(n-1)/2 edges between any two vertices.

2. Directed complete graph: any two vertices have two arcs in opposite directions, n(n-1)

6. Sub-picture

Generate subgraph: subgraph with the same vertex set

7. Connected, connected graph, connected components (undirected graph)

1. Maximally connected subgraph (connected component): A connected subgraph contains all edges, not necessarily all vertices (not necessarily a connected graph)

2. Minimally connected subgraph: a subgraph that maintains graph connectivity while minimizing the number of edges (can be a single vertex)

Minimum, maximum relative to the side

8. Strongly connected graph, strongly connected component (directed graph)

9. Spanning tree, spanning forest

1. Spanning tree of a connected graph; a minimal connected subgraph containing all vertices (n vertices have n-1 edges)

2. In a non-connected graph, the spanning tree of connected components constitutes the spanning forest of the non-connected graph.

10. Vertex degree, in-degree, out-degree

1. Degree of a vertex: the number of edges with the vertex as an endpoint

1. Degree of node

The number of child nodes of a node

2. In an undirected graph: the sum of degrees of all vertices = number of edges * 2

3. In a directed graph: the degree of a vertex = in-degree and out-degree, the sum of in-degrees of all vertices = the sum of out-degrees = the number of edges

11. Side of the right, net

1. Weighted picture (net): a picture with weights on the side

12.Dense graph, sparse graph

13. Path, path length, loop

1. Loop/Loop: The path where the first vertex and the last vertex are the same

14. Simple path, simple loop

1. Simple path: a path in which vertices do not appear repeatedly

2. Simple loop: Except for the first vertex and the last vertex, the other vertices do not appear repeatedly.

15. Distance

1. If the shortest path from v to w exists, it is a distance. If it does not exist, it is recorded as infinity.

16. Directed tree

1. A directed graph in which the in-degree of one vertex is 0 and the in-degree of the other vertices is 1.

2. Picture storage and basic operations

1. Adjacency matrix method (sequential)

1.Definition

1. A one-dimensional array stores vertex information, and a two-dimensional array stores edge information.

2.Features

1. Undirected graph/directed graph: The number of non-zero elements in the i-th row is the degree/out-degree of the i-th vertex

2. It is easy to determine whether there are edges, but it is difficult to determine how many edges there are.

3. Dense graphs are suitable for using adjacency matrices

4.Aⁿ[i][j]: The number of paths of length n from vertex i to vertex j

2. Adjacency list method (link)

1.Definition

1. Combine sequential and linked storage to create a singly linked list for each vertex i

2. Vertex table: Vertex field (data) pointer to the first adjacency list (firstarc)

3. Edge table: Adjacency point field (adjvex) Pointer field pointing to the next adjacency list (nextarc)

2.Features

1. Undirected graph: storage space O(|V| 2|E|) directed graph: O(|V| |E|)

2. Use adjacency list method to save space in sparse graphs

3. Cross linked list (directed graph)

1.Definition: A chain storage of directed graph

2. Structure

1. ArcNode: 5 domains

1. Tail domain (tailvex), head domain (headvex): indicates the arc tail and arc head

2. Chain domain hlink, tlink: indicates the next arc with the same arc head/arc tail

<v,w>: arc from v to w (v is adjacent to w/w is adjacent to v) v: arc tail w: arc head

3.info field: arc related information

2. Vertex pointer (VNode): 3 fields

1.data field: vertex data information (vertex name)

2. firstin domain, firstout domain: take this vertex as the first node of the arc head/arc tail

3. Note: Vertex nodes are stored sequentially

3.Features

1. It is easy to find the arc with vi as the tail and the arc with vi as the head. It is easy to find the out-degree/in-degree of the vertex.

2. It means it is not unique, but it determines a graph.

4. Adjacency multiple list (undirected graph)

1.Definition: Another chain storage of undirected graph

2. Structure

1. Each edge is represented by a node

1.mark flag field: whether this edge has been searched

2.ivex,jvex: the positions of the two vertices attached to the edge in the graph

3.ilink,jlink: points to the next edge attached to the vertex ivex/jvex

4.info: pointer field for various information

2. The vertex is represented by a node

1.data field: related information of vertices

2.firstedge domain: the first edge attached to the vertex

3.Features

1. Edges attached to the same vertex are connected in the same linked list, and each edge node is linked in two linked lists at the same time.

2. Each edge has only one node

5. Basic operations of graphs

1. Independent of the storage structure of the graph, different storage methods have different performance

2. Specific operations

3. Graph traversal

1. Breadth First Search (BFS)

1.Definition

A hierarchical traversal algorithm similar to a binary tree, giving priority to the earliest discovered node.

2. Thoughts

1. First visit the starting vertex v

2. Visit all unvisited adjacent vertices of v in sequence

3. Then start from these vertices and visit all their unvisited nodes.

3. Algorithm description

1. Hierarchical search, which visits a batch of nodes forward, unlike depth-first where there is a backwards situation, and it is not recursive.

2. Use the queue auxiliary mark array (whether the vertex is visited)

3. Algorithm implementation

4.Performance analysis

1. Space complexity: O(|V|)

With the help of auxiliary queue, vertices are queued once

2. Time complexity

1. Adjacency list: O(|V| |E|)

2. Adjacency matrix: O(|V|²)

The essence of graph traversal: the process of finding its adjacent points for each vertex

5.BFS solves the single-source shortest path problem

5. Breadth-first spanning tree

1. Definition: A traversal tree obtained during breadth traversal

2. Features: Unique in adjacency matrix, not unique in adjacency list

The same goes for DFS

2. Depth First Search (DFS)

1.Definition

Similar to pre-order traversal of a tree, the last discovered node is given priority.

2. Thoughts

1. First visit the starting vertex v

2. Visit any unvisited adjacent vertex w of v

3. Then visit any unvisited adjacent vertex w2 of w

4. Repeat until you can no longer access downwards and return to the most recently visited vertices in turn.

3. Algorithm description

1. Starting from a vertex, using recursive form

2. Algorithm implementation

4.Performance analysis

1. Space complexity: O(|V|)

Using a recursive work stack

2. Time complexity

1. Adjacency list: O(|V| |E|)

2. Adjacency matrix: O(|V|²)

5. Depth-first spanning tree/forest (non-connected graph)

3. Graph traversal and graph connectivity

1. Undirected graph

1. Connected: visit all vertices in one traversal

2. Non-connected: visit all vertices of connected components in one traversal

2. Directed graph

1. If there is a path from the initial vertex to each vertex, all vertices can be accessed, otherwise it cannot

3. A second for loop is added to the algorithm, and then the initial point is selected.

The first for loop initializes the vertices to FLASE

4. Undirected graph: The number of calls to BFS or DFS = the number of connected components of the graph

4. Application of diagrams

1. Minimum Spanning Tree (MST)

1.Definition

The spanning tree with the smallest sum of edge weights

2.Features

1. The tree shape is not unique, the sum of the edge weights is unique

2. Number of edges = number of vertices - 1

3. General Thoughts

When T does not form a spanning tree, find a minimum cost edge and no loop will be generated after adding T

4. Two algorithms

1.Prim algorithm (Prim)

1. Thought (expanding from the vertex)

1. Initialization: First select any vertex as the initial vertex

2. Loop (until all vertices are included): Then select the edge with the smallest weight among the adjacent edges of this vertex and will not form a cycle.

3. Then select the edge with the smallest weight among the adjacent edges of the two vertices that does not form a loop.

2.Features

1. Time complexity: O(|V|²), does not depend on |E|, suitable for graphs with dense edges

2.Kruskal algorithm (Kruskal)

1. Thought (increasing weight)

1. Initialization: first include all vertices, no edges

2. Loop (until it becomes a tree): Select edges in order of increasing weight without forming a cycle until n-1 edges are included

2.Features

1. Use a heap to store edge sets, with a time complexity of O(|E|log|E|), which is suitable for graphs with sparse edges and many vertices.

2. Shortest path

1.Dijkstra’s algorithm to find the shortest path from a single source

1. Auxiliary variables

1. Set S: records the vertices of the shortest path that have been found, and the initial value is 0

2.dist[]: Record the current (continuously updated) shortest path length from the source point v0 to other vertices. The initial value is arcs[v0][i]

3.path[]: path[i] is the predecessor node of the shortest path from the source point to i, and its value can be traced back to the shortest path from v0 to vi.

2. Thoughts

1. Initialization: The set S is {0}, dist[] is the distance from the initial vertex 0 to each vertex, and there is no infinity. The initial vertex 0 in path[] is -1 (always unchanged), the distance from 0 to other points is 0, and the distance from 0 to other points is infinity.

2. Select the point j with the smallest remaining value in dist[]. If dist[j] arcs[j][k]<dist[k], update dist[k] Add this point to the set S. If dist[k] is updated, let path[k]=j

3. Repeat the operation on the remaining points in the set S until S contains all points

3.Features

1. The time complexity of a single source is O(|V|²), and that of all node pairs is O(|V|³)

2. The algorithm does not apply when there are negative weights on the edges.

2.Floyd’s algorithm finds the shortest path between vertices

1. Thoughts

1. Recursively generate an n-order square matrix sequence, starting from A﹣¹ to Aⁿ﹣¹

Matrix superscript plus parentheses

2.Initially: If there is an edge between any two vertices, the weight is regarded as the shortest path. If there is no edge, it is infinite.

3. Later, gradually add vertex k (k from 0 to n-1) to the original path as an intermediate vertex. If the path decreases, replace the original path.

4.A(k)[i][j]: From vertex i to vertex j, the length of the shortest path whose intermediate node number is not greater than k

2.Features

1. Time complexity: O(|V|³)

2. Edges with negative weights are allowed, and edges containing negative weights are not allowed to form a loop.

3. Applicable to weighted undirected graphs, regarded as directed graphs with round-trip double edges

3.Topological sorting

1. Directed acyclic graph (DAG graph)

There are no cycles in directed graphs

2. The vertex represents the active network (AOV network)

The DAG graph represents a project, the vertices represent activities, and the directed edges <vi, vj> represent that activity i must precede activity j

3. Topological sorting (in DAG graph)

1.Definition

1. Each vertex appears only once

2. If A is in front of B, there is no path from B to A.

2. Implementation method

1. Select a vertex without a predecessor from the DAG graph and output it

2. Delete the vertex and all directed edges starting from it

3. Repeat steps 1 and 2 until the graph is empty/there is no node without a predecessor (there must be a cycle in the graph)

3.Features

1. Time complexity O(|V| |E|)

Output vertices while removing edges

4.Attention

1. The project can start from a vertex with indegree 0

2. A vertex has multiple direct successors, and the result is usually not unique.

3. The adjacency matrix is a triangular matrix, and there is topological sorting, but not necessarily vice versa. The vertex numbers can be rearranged according to the sorting results, and the adjacency matrix is a triangular matrix.

4. Critical path

1. Use edges to represent active networks (AOE network)

1.Definition

Vertices represent events, directed edges represent activities, and weights represent costs (time)

2. Two properties

1. After the event represented by the vertex occurs, the activities represented by the directed edges starting from the vertex can begin.

2. Only when all activities represented by directed edges entering a certain vertex end, can the event represented by the vertex occur.

2. Several concepts

1. Start vertex (source point): There is only one vertex with in-degree 0, the beginning of the entire project

2. End vertex (sink): There is only one vertex with an out-degree of 0, which is the end of the entire project.

3. Critical path: The path with the maximum path length

4. Critical Activities: Activities on the critical path

5. Minimum time: length of critical path

3. Several parameters

1. The earliest occurrence time Ve(k) of event vk

1. Definition: The longest path length from vertex V to Vk determines the earliest start time of all activities starting from Vk

2. Method of finding: Add weights from the source point backwards, and take the maximum value of different paths.

All activities must be completed before the event can start. Take the maximum value.

2. The latest occurrence time Vl(k) of event vk

1. Definition: The latest time that this time will occur without delaying the completion of the entire project

2. How to find it: Vl (sink point) = Ve (sink point), subtract the weights from the sink point forward, and take the minimum value of different paths

Take the minimum value to ensure that the entire project will not be delayed.

3. The earliest occurrence time e(i) of activity ai

1. Definition: The earliest occurrence time of the event represented by the starting point of the activity

2. How to find: <Vk, Vj> represents activity ai, then e(i)=Ve(k)

4. The latest occurrence time l(i) of activity ai

1. Definition: The difference between the latest occurrence time of the event represented by the end point of the activity and the time required for the activity

2. How to find: <Vk,Vj> represents activity ai,l(i)=Vl(j)-Weight(Vk,Vj)

5. The difference in activities d(i)=l(i)-e(i)

4. Critical path

The activities with the difference d()=0 among all activities constitute the critical path

Chapter 6 Search

1. Basic concepts of search

1. Lookup table (lookup structure): Data collection used for lookup

2. Suitable for static lookup tables: sequential search, half search, hash search

3. Suitable for dynamic lookup tables: Binary sorting trees, hash search, binary balanced trees and B-trees are all improvements of binary sorting trees.

4. Average search length (ASL)

2. Sequential search and binary search

1. Sequential search (linear search)

1. Sequential search of general linear tables

1. Algorithm characteristics

1. Introduce the sentinel (the first element of the array) and search from back to front. There is no need to judge whether the array will cross the boundary, which improves program efficiency.

2. Advantages and Disadvantages

1. Disadvantages: When n is large, efficiency is low

2. Advantages: There are no requirements for the storage of data elements and no requirements for orderliness.

3.Performance

1.ASL success=(n 1)/2

2.ASL failed=n 1

add a sentry

2. Sequential search in ordered list

1. Decision tree: circular nodes are existence nodes, rectangular nodes are failure nodes (interval), n successes must have n 1 failure nodes

2. The search length to reach the failed node = the number of layers of a circular node above it

3. Only ASL failure is different = (1 2 ... n n)/n 1

The last two failed nodes are on the same level, both are n

2. Half search (binary search)

1. Algorithm characteristics

1. Loop condition: low≤high ensures that they all point to the same value in the end, otherwise there will be one less number to search for.

2.mid=(low high)/2 is equivalent to removing the bounds

2.Performance

1. Time complexity: O(log₂n)

2.ASL: Number of layers per layer * number of nodes per layer / number of summary points, n layer has 2ⁿ﹣¹ nodes

3. Block search (index order search)

1. Algorithm characteristics

1. Absorb the respective advantages of sequential search and binary search, have a dynamic structure, and are suitable for fast search.

2. Thoughts

1. Divided into several sub-blocks, the blocks can be unordered and the blocks can be ordered.

2. The largest keyword in the first block < all records in the second block, and so on

3. Create an index table, containing the largest keyword of each block and the address of the first element of each block, arranged in order by keyword

3. Search process

1. Determine the block in the index table and search sequentially/in half

2. Search sequentially within the block

4.Performance

ASL=index table ASL intra-block ASL

Question summary

1. Error-prone

1. The speed of half search is now generally faster. In special cases, sequential search may be faster.

2. Time performance of binary search and binary sorting tree

1. Half search is measured by a binary decision tree, and ASL is always O(log₂n)

2. Binary sorting tree is related to the input order of data, and the worst case is O(n)

3. When searching by half, the middle value is bounded

2.Formula

1. The height of the binary decision tree: ┎log₂(n 1)┒ or ┕log₂n┙ 1

1. Is it a balanced binary tree, with height differences at most 1, or a binary sorting tree?

2. The number of comparisons of unsuccessful nodes (the difference does not exceed 1)/the maximum number of comparisons

3. Total number of nodes n=2^h-1

2. The optimal block length of the index sequence table of n records:

All are searched sequentially, the block length is b, the index table ASL=(n/b 1)/2, the intra-block ASL=(b 1)/2, and the basic inequality is used for addition

3. Sequential search ASL=(n 1)/2, index search ASL=index table ASL intra-block ASL

3. Question type

1. Determine whether a tree is a binary search decision tree

Take the last node to be compared (the middle position of the tree) and determine whether the upper and lower bounds are consistent (you can also determine the upper and lower bounds from the root node first)

2. Use an idea similar to block search directly in the ordered list: W252

The search length within a block is different for each segment

4.Knowledge points

1.k-point search method: The time complexity of success and failure is O(logk(n))

The depth of a k-ary tree with n nodes is ┕logk(n)┙ 1

2. When the search probabilities are different, the most effective way is to use sequential search

Sort in descending order of search probability

3.B-tree and B-tree

1.B-tree and its basic operations

1.Definition

0.B-tree (a multi-path balanced search tree with a balance factor of 0 for all nodes), the maximum number of child nodes is the order of the B-tree (m)

1. Each node has at most m subtrees (at most m-1 keywords)

2. If the root node is not a terminal node, there are at least two subtrees and 1 keyword

3. All non-leaf nodes except the root node have at least ┎m/2┒ subtrees and at least ┎m/2┒-1 keywords.

4. All non-leaf node structures

1. Keywords are arranged in ascending order

2. All the numbers in the left subtree <correspond to the keyword, all the numbers in the right subtree> correspond to the keyword

5. All leaf nodes are at the same level, without information (external nodes/failure nodes)

2.The height of tree B

1. Minimum height

Each node has at most m subtrees and m-1 keywords

h≥logm(n 1) n≤(m-1)(1 m m² ... m^(h-1))=m^h-1

2. Maximum height

The first layer has at least one node, the second layer has at least 2 nodes, the third layer has at least 2┎m/2┒,..., and the h 1st layer has at least 2┎m/2┒^(h-1) nodes. point

The h1th layer is a leaf node (failure node), the number is n 1 (number of intervals), and there are n-1 failure nodes for n keywords.

n 1≥2┎m/2┒^(h-1)

3.B-tree search

Similar to a binary search tree

4.Insertion into B-tree

1. Positioning: must be inserted into a non-leaf node at the lowest level

2.Insert

1. The number of keywords for each non-failed node is within [┎m/2┒-1,m-1]

2. If the number of keywords after insertion is <m, insert directly; >m-1, split the node

3. Splitting method

1. Divide the original node after inserting the key into two parts from the middle position ┎m/2┒

2. The node at the middle position ┎m/2┒ is inserted into the parent node, and the left and right parts are used as the left and right subtrees.

3. If the parent node also exceeds the upper limit, continue to split until it reaches the root node. The B-tree height is 1

5.Deletion of B-tree

1. At the terminal node

1. Delete keywords directly

Number of keywords>┎m/2┒-1

2. Brothers are enough

1. The number of keywords = ┎m/2┒-1, and the keywords of the adjacent right (left) sibling nodes are >┎m/2┒

2. Parent-child transposition method: One of the parent nodes enters the deleted node, and one of the sibling nodes enters the parent node.

3. Not enough brothers to borrow (merger method)

1. The number of keywords = ┎m/2┒-1, and the keywords of the adjacent right (left) sibling nodes = ┎m/2┒-1

2. Merge method: Merge one of the parent nodes with one of the sibling nodes to replace the deleted node. The parent node has one less keyword.

3. If the parent node is the root node and is reduced to 0, delete the root node directly and the new node after merging becomes the root.

4. The parent node is not the root node and is reduced to ┎m/2┒-2, and then adjusted or merged with its brothers (borrow first when enough is available)

2. Not at the terminal node

1. If the number of keywords in the left subtree >┎m/2┒-1, replace k with the predecessor node of k (the rightmost node of the left subtree), and then delete k recursively

2. If the number of keywords in the right subtree >┎m/2┒-1, replace k with the successor node of k (the leftmost node of the right subtree), and then delete k recursively

3. If the number of keywords in the left and right subtrees = ┎m/2┒-1, directly merge the two subtrees and delete k directly

2.Basic concept of B-tree

1.Definition

1. Each branch node has at most m subtrees (child nodes)

2. The non-leaf root node has at least two subtrees, and all non-leaf nodes except the root node have at least ┎m/2┒ subtrees

3. The number of subtrees of a node = the number of keywords

In B-tree: the number of subtrees of a node = the number of keywords 1

4. All leaf nodes contain all keywords (arranged in order) and pointers to corresponding records, and adjacent leaf nodes are linked to each other according to size.

5. All branch nodes (indexes that can be regarded as indexes) only contain the maximum value of the keywords in each of its child nodes and pointers to the child nodes.

2.Features

1. Two head pointers: one points to the root node, and the other points to the leaf node with the smallest keyword

2. Two search operations: search from the minimum keyword order/multi-way search from the root node

3. Note: When searching, the search does not end when the non-leaf node is equal to the given value. The search continues downward until the leaf node. Regardless of whether the search is successful or not, it is a path from the root node to the leaf node.

Question summary

1. Error-prone

1. When the number of keywords in a node ≠ the number of subtrees, it must not be a B tree

2. All non-leaf nodes except the root node have at least ┎m/2┒ subtrees and at least ┎m/2┒-1 keywords.

Pay special attention to the root node: there are at least two subtrees and 1 keyword

3. When the B-tree is split after insertion, as long as the root node does not split, the height will not be 1

2.Formula

1. Minimum height: h≥logm(n 1)

2. Maximum height: n 1≥2┎m/2┒^(h-1)

3.Knowledge points

1. For a third-order B-tree: when the number of nodes is the smallest, it is similar to a full binary tree; when the number of nodes is the largest, it is similar to a full ternary tree.

Summary points=m^h-1

2.B tree is used for file index/database index

4.Hash table

1. Basic concepts of hash tables

1. Hash function

The function that maps keywords to corresponding addresses is recorded as Hash(key)=addr (can be array subscript/index/memory address)

2.Conflict

Two or more different keywords map to the same address

3.Synonyms

Collision of different keywords

4.Hash table

A data structure that is accessed directly based on keywords, ideally O(1)

2.Construction method of hash function

0.Attention points

1. The definition domain must contain all keywords, and the value range depends on the size/address range of the hash table

2. Addresses should be equally likely and evenly distributed.

3. Make the function as simple as possible and calculate it in a short time

1. Direct addressing method

1. Function: H(key)=a*key b

2. Features: No conflict, suitable for basically continuous distribution of keywords

2. Division with remainder method

1. Function: H(key)=key%p

2. Selection of P: The prime number p that is no larger than m (hash table length) but closest to or equal to m

3. Digital analysis method

1. Select a number of bits with a relatively even digital distribution as the hash address

2. Features: Suitable for collections of known keywords

4.Method of finding the middle of the square

1. The middle digits of the square value are used as the hash address

2. Applicable to situations where the values of each bit are not uniform enough/are less than the number of bits required for the hash address.

5. Folding method

1. Divide the keyword into several parts with the same number of digits, and take the superposition sum as the hash address.

2. Suitable for a large number of digits and the digits on each bit are roughly evenly distributed.

3. Ways to handle conflicts

1. Open Address Law

0.Definition

1. Free addresses are open to both synonyms and non-synonyms

2. Recursion formula: Hi=[H(key) di]%m, di: incremental sequence

1. Linear detection method

1.Definition: di=0,1,2,...,m-1

2. Features: Causes a large number of elements to gather at adjacent hash addresses, reducing search efficiency.

2. Square detection method (secondary detection method)

1. Definition: di=0²,1²,-1²,2²,-2²,...,k²,-k²

2.Features

1. Advantages: Can avoid accumulation problems

2. Disadvantage: Only half of the cells can be detected

3.Rehashing method (double hashing method)

1. Definition: di=Hash₂(key), Hi=[H(key) i*Hash₂(key)]%m

i is the number of conflicts

2.Features

1. Use the second hash function to calculate the address increment

2. All positions can be detected up to m-1 times

4. Pseudo-random sequence method

1.Definition: di=pseudorandom sequence

5.Attention

1. You cannot physically delete existing elements at will, as it will truncate the search addresses of other elements with the same hash address.

2. Can be marked for deletion and logical deletion

3. Side effects: After multiple deletions, the hash table appears to be very full, but in fact there are still many unused locations and requires regular maintenance.

2. Zipper method (link method)

1. Definition: All synonyms are stored in a linear linked list, uniquely identified by a hash address

2. Suitable for frequent insertion and deletion situations, and can be directly physically deleted

4. Hash search and performance analysis

1. Search process

0.Initialization: Addr=Hash(key)

1. Check whether there is a record on Addr

1. No record, search failed

2. There is a record, and the equal search is successful; wait for execution 2

2. Calculate the "next hash address" using the given conflict handling method, set Addr to this address, and transfer to 1

2. Search efficiency

1. Depends on: hash function, method of handling collisions, loading factor

2. Filling factor (α): defines how full a table is

α=number of records in the table n/hash table length m

3. The average search length depends on α and does not directly depend on n or m.

Question summary

1. Error-prone

1. K synonyms are filled into the hash table using linear detection, which requires K (K 1)/2 times of detection.

1 2...K

2. The probability of conflict is proportional to the size of the loading factor

The more full it is, the more likely it is to conflict.

3. The hash function cannot be constructed using a random number function, and normal searches cannot be performed.

2. Knowledge points

1. Causes of accumulation

Probe sequences for synonym conflicts and different probe sequences for non-synonyms are intertwined

2. Average number of detections = average search length

3. Question type

1.ASL success

Number of searches = number of conflicts 1

2.ASL failed

Determine the total required search positions based on the hash function, and search for each position until it is empty. If it is not empty, use the corresponding conflict handling method to search again. If it is empty, it also needs to be compared.

5. skewers

1. Definition of string

1. Space string: composed of one or more spaces, not an empty string

2. Linear tables use a single element as the operation object, and strings use substrings as the operation object.

2. String storage structure

1. Fixed-length sequential storage representation

1. Similar to linear table sequential storage, allocate fixed-length arrays

2. Truncation: String values exceeding the predefined length will be discarded.

2. Heap allocation storage representation

1. Use malloc() and free() to dynamically allocate and delete

3. Blockchain storage representation

1. Linked storage similar to a linear list, each node can hold one or more characters

3. Basic operations of strings

1. Minimum subset of operations (5): cannot be implemented using other string operations

1.String assignment: StrAssigh

2.String comparison: StrCompare

3. String connection: Concat

4. Find the string length: StrLength

5. Find substring: SubString

4. String matching pattern

1.Definition: Positioning operation of substring

2. Thoughts

1. Starting from the pos character of the main string, compare it with the first character of the pattern string

2. If they are equal, continue to compare the subsequent characters one by one.

3. If not equal, start the comparison again from the next character of the main string.

3. Efficiency

Worst time complexity: O(mn)

5. Improved pattern matching algorithm-KMP algorithm

1. String prefixes and partial matching values

1. Prefix: All header substrings except the last character (can only start from the first position)

2. Suffix: All tail substrings except the first character (can only start from the last position)

3. Partial matching value: The longest prefix and suffix lengths are equal

2.Efficiency

1. Time complexity: O(m n)

3. Solve the next array (manual)

1.next array value = longest same prefix and suffix length 1

2. The string itself cannot be used as a prefix or suffix.

3. Usually next[1]=0 (depends on the specific question, if it is -1, still use 0 for calculation, and finally all numbers are -1)

4. When the i-th element in next[i] does not match, the string length is i-1, excluding the i-th element.

5. When the string is long, you can only count the first few to get the answer.

Question summary

1.Knowledge points

1.The biggest advantage of KMP algorithm

The main string does not backtrack, the i pointer does not move, j=next[j]

2. Question type

1. A quick way to solve next[], assuming that next[n] is currently being solved, next[n-1]=m

0.Default next[1]=0, next[2]=1

The subscript can also start from 0, and the numerical value can also start from -1.

1. The string has n-1 characters. If the first m characters are the same as the last m characters, then next[n]=m 1, go to 3

2. When the first m numbers are different from the last m numbers, if m=1, then next[n]=1, go to 3. Otherwise, let m=m-1, then compare the first m-1 and the last m-1, if they are the same, go to 1, if they are different, continue to 2.

3. End the operation

2.KMP matching process

When the value of next[] starts from 0, the subscript number starts from 1. When there is no match, the i pointer does not move, j=next[j]

3. Error-prone

1. During the exam, you must distinguish whether the next[] value starts from 0 or 1

Chapter 7 Sorting

1. Basic concepts of sorting

1. Definition of sorting

1. The stability of an algorithm cannot measure the quality of an algorithm.

2. Internal sorting: All elements are placed in memory

3. External sorting: constantly moving between memory and external memory

Question summary

1. Error-prone

1. The sorting algorithm implemented on the sequence list may not be implemented on the linked list.

2. If you use different sorting methods to sort the same linear table, the sorting results obtained may be different.

2.Formula

1. The minimum number of comparisons for sorting any n keywords is ┍log₂(n!)┑

2. Insertion sort

1. Direct insertion sort

1. Thoughts

1. Assume that the first element has been sorted

2. Starting from the second element, compare it with the previous one in sequence. If immediate exchange is not satisfied, sort in place and repeatedly move the already arranged items backward.

2.Features

1. Copy A[0] as a sentinel to avoid out-of-bounds subscripts, facilitate assignment, and eliminate the need to apply for temporary variables.

2. Compare and move at the same time

3.Performance analysis

1. Time efficiency

1. Best case: O(n)

Only needs to be compared once each time, no need to move

2. Worst case: O(n²)

2. Applicability

Both sequence and chain are suitable. Most algorithms are only suitable for sequence.

4. Algorithm implementation

2. Half-way insertion sort

1. Thoughts

1. First fold in half to find the position where the element is to be inserted.

2. Move all elements after the position to be inserted uniformly

2.Features

1. Only the number of comparison elements is reduced, the number of moves has not changed

2. The number of comparisons has nothing to do with the initial state, only n

3.Performance analysis

1. Time complexity: O(n²)

It is only related to the number of moves (two-level for loop) and has nothing to do with the number of comparisons. Comparison is only an operation completed in the for loop.

2. Stable

4. Algorithm implementation

3. Hill sorting (reducing incremental sorting)

1. Thoughts

1. First take the step size as n/2, divide it into several groups, and use direct insertion sorting in each group.

2. Reduce the step size by half and continue until the step size is 1

2.Features

1. The best incremental sequence has not yet been found, and the time complexity cannot be calculated.

3.Performance analysis

1. Time efficiency

Worst case: O(n²)

2. Unstable

The same keywords may be divided into different sub-tables

4. Algorithm implementation

Question summary

1. Error-prone

1. The time complexity of half insertion sort is O(n²)

3. Exchange sorting

1. Bubble sort

1. Thoughts

1. Starting from the last element, compare the two adjacent elements. If they are in reverse order, exchange them.

2. A bubbling operation will result in swapping the smallest element to the first position.

3. In the next round of bubbling, the smallest element determined in the previous round will no longer participate, and the column to be sorted will be reduced by one element.

4. The result of each bubble is that the smallest element in the sequence is placed at the final position, and up to n-1 is completed.

2.Features

1. The ordered subsequence generated by bubble sorting must be globally ordered, which is different from direct insertion sorting.

3.Performance analysis

1. Time efficiency

1. Best case: O(n)

After bubbling, the flag is still False, there is no element exchange, and the loop is jumped out directly.

4. Algorithm implementation

2. Quick sort (divide and conquer method)

1. Thoughts

1. Each time, the first element in the current table is taken as the base pivot (pivot value) to divide the table.

2.i points to the first element (base), j points to the last element

3. Start with j, and find the first element smaller than the baseline from back to front. j points to the position of this element, and replace the element pointed to by i with this element.

4. Starting from i, find the first element larger than the baseline from front to back. i points to the position of this element, and replace the element pointed to by j with this element.

5. Start from j again and repeat until the contact between i and j stops. Place the reference value at the contact position and divide the sequence into two parts. The front is smaller than the reference value and the latter is greater than the reference value.

6. Take the first element of the two subsequences as the reference value and repeat the operation

2.Features

1. No ordered subsequence is generated, but an element will be placed at the final position after each sorting pass

2. The more ordered the algorithm is, the lower the efficiency is, and the more disordered the algorithm is, the higher the efficiency is.

3.Performance analysis

1. Space efficiency

Average case: O(㏒₂n)

The algorithm is recursive and requires a recursive work stack, the size of which is the depth of the recursive tree.

Worst case: O(n)

2. Time efficiency

1. Worst case: O(n²)

The sequence is basically in order or reverse order

2. Best case (average case): O(n㏒₂n)

4. Unstable

Exchange keyword exists

4. Ways to improve efficiency

1. When the subsequence obtained by recursive division is small, recursion is no longer required and direct insertion sorting is used for subsequent work.

2. Try to choose a benchmark that can divide the data.

1. Select three elements from head, tail and middle, and then take the middle value

2. Randomly select benchmarks to ensure that the worst case scenario does not occur

5. Algorithm implementation

Question summary

1. Error-prone

1. The fastest situation when a group of numbers is sorted using quick sort

Each selected benchmark divides the table into two sub-tables of similar length.

2. When looking for possible sequences, both orders must be considered (from large to small, small to large)

2. Knowledge points

1. Quick sort is most suitable for sequential storage.

3.Algorithmic thinking

1. Algorithm to move all odd numbers in front of all even numbers (least time/space)

1. First find an even number L(i) from front to back, then find an odd number L(j) from back to front, exchange the two, and repeat until i>j

2. Based on quick sort, one traversal, time O(n), space O(1)

2. Dutch flag problem: a sequence of blocks consisting only of red, white, and blue, so that the blocks are arranged in the order of red, white, and blue, and the time is O(n)

1. Scan sequentially, swap red to the front and blue to the back

2. Three pointers, one working pointer, one pointing to the head, one pointing to the tail, switch statement

Note: Positive numbers, negative numbers, and 0 are sorted in the same way.

3. Find the kth smallest element in the array L[1...n]

0. You can use direct sorting to get k elements or more O(nlog₂n) / use a small top heap O(n klog₂n)

1. More exciting, based on quick sorting, the time is O(n)

2. Select a benchmark and perform the same division operation as quick sort, divided into L[1...m-1] and L[m 1...n], L(m) is the benchmark

3. Discuss the relationship between m and k

1.m=k, the benchmark is the element you are looking for

2.m<k, the element you are looking for is in the second half, continue to recursively find the k-mth smallest element in the second half

3.m>k, the element you are looking for is in the first half, continue to recursively find the kth smallest element in the first half

4. n numbers constitute a set A, which is divided into two disjoint subsets A1 and A2. The numbers are n1 and n2, and the sum of the elements is s1 and s2. Satisfy |n1-n2| minimum and |s1-s2| maximum

1. Put the smallest n/2 elements in A1, and the rest in A2, imitate quick sort, and process the reference position i separately after dividing

2. If i=n/2, grouping is completed

3. If i<n/2, put the base and all previous elements in A1, and continue to divide the elements after i

4. If i>n/2, put the base and all subsequent elements in A2, and continue to divide the elements before i

5. No need to sort all elements, average time O(n), space O(1)

4.Select sort

1. Simple selection sorting

1. Thoughts

1. Find the smallest element in the first pass and exchange this element with the first element

1. Find the smallest element in the second pass, exchange this element with the second element, and repeat n-1 times

2.Features

1. Each sorting pass can determine the final position of an element.

3.Performance analysis

1. Time complexity is always O(n²)

The number of element moves is rarely O(n), but the number of comparisons has nothing to do with the initial state and is always n(n-1)/2

2. Unstable

Direct exchange after finding the smallest element will destroy the original order.

4. Algorithm implementation

2. Heap sort

1. Thoughts

1. Definition of heap

Large root heap (complete binary tree): the largest element is the root node, and the value of any non-root node is less than or equal to the value of its parent node

1.1 Heap operations

1. Delete the top element of the heap

First exchange the last element with the top element of the heap. After deleting it, adjust the top element of the heap downwards to become a heap.

2. Insert operation

First place the new node at the end of the heap, and then adjust it upward to the heap

2. Construct the initial heap (large root heap)

1. Make judgments from the last node on the penultimate layer forward.

2. Only look at the subtree of the node (not the parent node). If there is a node in the subtree that is larger than the root node, find the larger one and exchange it with the root node.

3. If the next-level heap is destroyed after the exchange, continue to use the above method to adjust the next-level heap until the root node.

3. Output the top element (maximum value) of the heap, put the last element into the top of the heap, and adjust the top of the heap downward to become a large top heap.

4. Repeat the operation until there is only one element left in the heap

2.Performance analysis

1. The time complexity is always O(n㏒₂n)

The heap building time is O(n), and then there are n-1 downward adjustments, each time O(h)

2. Unstable

When building a heap, the original order will be disrupted.

3. Space complexity: O(1)

Use only a constant number of auxiliary units

3. Algorithm implementation

Selection sorting is unstable

Question summary

1. Classic questions

1. Each element has two data items k1 and k2. Sort: look at k1 first, with the smaller one first; k1 has the same value, then look at k2, with the smaller one first.

1. First determine the sorting order. K2 must be sorted first, followed by k1. Make sure to look at k1 first.

2. There are no stability requirements for k2 sorting. Try to choose an algorithm with high efficiency.

3. A stable algorithm must be used to sort k1, otherwise it may disrupt the sorted k2

2. Question type

1. Number of comparisons when inserting/deleting elements from the heap

2. Just want to get the partial sorted sequence before the kth (k≥5) smallest element

1. Bubble/simple selection/heap sort available

2. The first two times are kn

3. Heap sorting, the initial heap to be built does not exceed 4n, the time to get k elements is klog₂n, a total of 4n klog₂n, optimal when k≥5

3. Determine whether a data sequence is a small root heap

1. Two situations need to be divided, the sequence is odd number and even number

2. When it is an odd number: there is no single branch node, you can directly compare the parent node with the two child nodes

3. When it is an even number: there is a single branch node, which can be judged separately, and other nodes are normal.

5. Merge sort and radix sort

1. Merge sort

1. Thoughts

1. First divide into n sub-tables with length 1, and merge them in pairs.

2. Obtain the sublists with length 2 and merge them two by two again, and repeat

2.Performance analysis

1. Space complexity: O(n)

The merged data needs to be copied to the auxiliary array, the length is n

2. Time complexity: O(n㏒₂n)

Each merge is O(n), and a total of n times are performed. The divided subsequence has nothing to do with the initial state.

3. Algorithm implementation

2. Radix sort

1. Thoughts

1. Least Significant First (LSD)

1. First put the sequence into 10 queues (0~9, top in and bottom out) according to the size of the single digits.

2. Dequeue all elements from the first queue in sequence until the 10th queue

3. Then repeat the operation of entering and exiting the queue according to the tens digit, and the same operation is done for the hundreds digit.

4. When the highest bit is also dequeued, the sequence is in order.

2. Most significant bit first (MSD)

1. First put the sequence into 10 queues (0~9, top in and bottom out) according to the size of the highest bit.

2. Sort the queues whose number is not 1 recursively, and put them into 10 queues again according to the size of the next highest digit.

3. Until there is only 1 number in all queues, the recursion ends, collect each queue and return to the previous layer

2.Features

1. Not based on comparison, adopt the idea of multi-keyword sorting

3. Performance Analysis (LSD)

1. Space complexity: O(r)

r is the base of the number (what base system)

2. Time complexity: O(d(n r))

Carry out d times of allocation and collection. One allocation is O(n) and one collection is O(r). It has nothing to do with the initial state.

3. Stable

Both allocation and collection are done sequentially

Question summary

1. Error-prone

1. What method should be used to sort 10TB data files?

Merge sort, the file is too large, internal sorting cannot be used, only external sorting can be used, usually merge sorting is used

2. Knowledge points

1. Merge two ordered lists with n elements each into one ordered list

1. The minimum number of comparisons is: n

2. The maximum number of comparisons is: 2n-1

3. Question type

1. How many sorting code comparisons are performed in radix sorting?

Did several allocations and collections

6. Comparison and application of various internal sorting algorithms

1. Comparison of internal sorting algorithms

2. Application of internal sorting algorithm (summary)

1. Selection of algorithm

1.n is smaller

1. The record itself is small

Direct insertion (fewer comparisons)/simple selection (fewer exchanges)

2. The record itself is large

Simple choice

1.1 Medium scale (n<1000)

Hill sort is a good choice

2. n is very large (choose the one with less time)

1. Unstable

1. Quick sort: The best method, with the shortest average time when randomly distributed

2. Heap sort: requires less auxiliary space than quick sort

2. Stable

Merge sort: but has the highest space complexity

3.n is very large, the number of keywords is small and can be decomposed

Radix sort is better

4. The sequence is basically in order

Direct insertion sort (highest efficiency/least number of comparisons)/bubble sort

The best case time complexity is O(n)

5.The result after several sortings

1. Place one element at the final position at a time

Simple selection/heap sort, bubble sort/quick sort (exchange)

Only the quick sorted element is not the most valuable element, the other 3 are all the most valuable (first/last)

2. The first few elements are in order, but not the final position

insert directly

3. Maybe all elements are not in their final position before the last trip starts.

insert directly

4. There is no guarantee that there will be an element at the final position

Direct insertion/Hill sort/Merge sort

6. Several of the most

1. Currently the best internal sorting algorithm in terms of average performance

Quick sort

2. Algorithms that take up the most space

Merge sort O(n)

Quicksort is O(n) in the worst case

3. What is the fastest when it is basically in order?

direct insertion sort

4. In the best case, linear time can be achieved

direct insertion/bubble sort

7.0 Algorithm time is independent of the initial sequence

Selection sort (simple selection/heap sort)/merge sort/radix sort

7.1 The number of sorting passes has nothing to do with the initial sequence

Direct insertion/simple selection are all n-1 times, and radix sorting is fixed at d times.

7.2 The number of moves has nothing to do with the initial sequence

Radix sort

Radix sort has nothing to do with the initial sequence

8. Find the least efficient data structure

Heap: mainly designed for sorting, and is unordered when searching.

9. If sequential storage is replaced by chain storage, time efficiency will be reduced.

Hill sorting/heap sorting (taking advantage of the random access characteristics of sequential tables)

2. Improvements to merge sort

Combined with direct insertion sorting: first use direct insertion to obtain longer ordered subsequences, and then merge them in pairs, still stable

3. Algorithm based on comparison

The comparison decision process can be described by a binary tree, so it takes at least O(n㏒₂n) time

4. The record itself contains a large amount of information

Using linked lists as storage structures can avoid spending a lot of time moving records

7.External sorting

1. Basic concepts of external sorting

1. Reduce the number of external memory reads and writes

Increase the number of merging paths/reduce the number of merging segments

2. Increase the number of merge paths (m)

loser tree

After use, the number of comparisons has nothing to do with m. You can increase m to reduce the height of the merge tree.

3. Reduce the number of merged segments

permutation-selection sort

Increase the length of merge segments to reduce the number of merge segments

4. Organize the merging sequence of merging segments with different lengths

best merge tree

Huffman tree generalized to m-ary tree

2. External sorting method (merge sort)

1. Two relatively independent stages

1. Get the merged segment/serial string

According to the size of the memory buffer, the n records are divided into several records, which are read into the memory sequentially and sorted using internal sorting, and then written back to the external memory.

2. Merge the merged segments one by one, so that the merged segments increase from small to large until they are completed.

2. Number of merge passes S = height of tree = ┌㏒mⁿ┐

n-the number of initial merging segments m-the number of merging paths ┌┐-round up

3. Multi-way balanced merging and loser tree

1. The internal merging time increases with the growth of m

1. Offsets the benefits gained from reducing the number of external memory accesses due to increasing m.

2. Ordinary internal merge sort cannot be used

2. Loser tree (complete binary tree)

1. Leaf node

Records currently participating in the comparison

2. Internal nodes

Memorize the sequence number of the loser in the left and right subtrees, and let the winner continue to compare upward until the root node

3. Root node

The serial number of the current minimum/maximum number, not the value itself (winner)

3. After using the loser tree to obtain the minimum value sequence number, take out the minimum value number, add the next keyword at its position, continue the comparison, and construct the loser tree.

4. After using the loser tree, the number of comparisons has nothing to do with m. You can increase m to reduce the height of the merge tree.

5.m is not that bigger is better

As m increases, the input buffer increases, its capacity decreases, and the number of data exchanges between internal and external memory increases.

4. Replacement-selection sort (generate the longest initial merge segment)

1. Thoughts

1. First take the smallest number a in the work area

2. Then take the smallest number among the numbers greater than a and put it into this merging section, and the number smaller than a is put into the next merging section.

5. Best merge tree

1. Definition (generalized m-ary Huffman tree)

1. Leaf node

An initial merge segment that participates in the merge

2. Weight of leaf node

The number of records in the initial merge segment

3. Path length from leaf node to root node

Number of merge passes

4.Non-leaf nodes

New merged segments generated by merging

5. Weighted path length of merge tree

Total number of read records

Number of edges passed by all leaf nodes * weight

2. Thoughts

Let the merge segments with fewer records be merged first, and those with more records be merged last.

3. The optimal merge tree is a strict m-ary tree

1. If the last missing item is not enough to form a strict m-ary tree, add a "virtual segment" with a length of 0 and merge it first.

2. Number of empty merge segments = m-u-1

u=(n0-1)%(m-1)

n0 - the number of nodes with degree 0 (the number of initial merged segments) m-m tree

Question summary

1. Question type

1. There are n records in total, the workspace can accommodate a records, and m-way balanced merging is performed.

1. The initial number of merging segments r=n/a, the number of merging passes s=┌㏒m(r)┐

2.5 initial merge segments, each with 20 records, using 5-way balanced merge, Number of comparisons without loser tree and with loser tree

1. No loser tree is used: 5 records need to be compared 4 times to select the smallest one. There are 100 records in total. 99 operations are required to select the smallest one, 4*99=396

2. Use the loser tree: The height of the loser tree is h=3, select the smallest one each time, and the number of comparisons does not exceed h, a total of 100 times, no more than 300 times

3. Do m-way balanced merging, the number of input and output buffers

1. Normal situation (serial operation): m input buffers and 1 output buffer are required

2. Input/output parallel operation: 2m input buffers, 2 output buffers

4. The total number of input/output files available at the same time does not exceed 15

Up to 14 ways of merging can be done

theme

floating theme