Features
Features:

Product Tour >

Edraw AI >

Paid Plans:

Individuals >

Business >

Eduaction >
Resources
Blog

History

How-tos & Tips

Discovery

Biography

Business Analysis

Examples

AI concept Map

Free AI Mind Map Generator

Onenote Mind Map

Bcg Matrix Examples

Nike Marketing Strategy

Unilever SWOT Analysis

Make Mind Maps in Google Docs

Guide

FAQs

What's New

Resource Center
Templates
All Templates

Brain Storming Templates

Strategy and Planning Templates

Project Management Templates

Product Management Templates

Human Resources Templates

Agile Workflow Templates

Marketing Templates

Education Templates

Fun and Games Templates

User Gallery
Download
Pricing
Enterprise

MindMap Gallery Data Structure Chapter 7 - Search

Data Structure Chapter 7 - Search

Chapter 7 of "Data Structure" - Searching knowledge, including basic concepts of search, sequential search and binary search, tree search, hash table, B-tree and B-tree, etc.

Edited at 2022-11-23 16:07:51

PlotWizard

Recent works View more works>>

Data Structure Chapter 7 - Search

PlotWizard

Recent works View more works>>

Recommended to you
Outline

Data structure basic algorithms
- 9
PlotWizard
Data structure course framework
- 6
PlotWizard

Find

Basic concepts of search

1) Find

The process of finding data elements that meet certain conditions in a data collection

2) Lookup table (lookup structure)

Data collection used for lookup

static lookup table

Only used for queries

dynamic lookup table

Support dynamic insertion and deletion

3) Keywords

The value of a data item in a data element that uniquely identifies the element

4) Average search length

n: length of lookup table

Pi: The probability of finding the i-th element (usually 1/n)

Ci: the number of comparisons required to find the i-th element

Sequential search and binary search

sequential search

Sequential search of general linear tables

Find average successful length: (n 1)/2

Average length of search failures: n 1

Sequential search in ordered list

Find average successful length: (n 1)/2

Average length of failed search: n/2 n/(n 1)

Search by half (two points)

Basic idea

Only applicable to ordered sequential lists, the key is compared with the middle position element, and then the middle position element in the first half or the second half is compared recursively according to the size until the search succeeds or fails.

Implement code

decision tree

Used to describe the process of half search

Maximum branch height

Since the decision tree is a balanced binary tree, the maximum height difference between branch trees is 1, that is, the maximum number of comparisons - the minimum number of comparisons = 1

The decision tree is also a binary sorting tree (i.e. left subtree keywords < root node < right subtree, or vice versa)

time complexity:

Block search

Basic idea

Also known as index sequence search. Divide the lookup table into several blocks. The keywords of the previous block are smaller than those of the next block, but the elements in the block can be disordered. Create an index table. Each item indicates the maximum keyword of each block and the address of the first element. The index table is organized by key. words in order

When searching, first determine the block where the record to be searched is located in the index table (use sequential or half-search index table); then search sequentially within the block.

Average search length = index search within block search

All search in order

Divide the lookup table of length n into b blocks, each block has s records.

The index table uses half search and sequential search within the block.

tree search

Binary sort tree (BST)

definition

A binary sorting tree (also called a binary search tree) is either an empty tree or a binary tree with the following properties:

1) If the left subtree is not empty, the values of all nodes on the left subtree are less than the value of the root node.

2) If the right subtree is not empty, then the values of all nodes on the right subtree are greater than the value of the root node.

3) The left and right subtrees are also a binary sorting tree respectively.

Perform in-order traversal on a binary sorted tree to obtain an increasing ordered sequence.

Find

insert

1) If the keyword k is less than the root node value, insert it into the left subtree; otherwise, insert it into the right subtree

2) The inserted node must be a newly added leaf node and the left or right child of the last node visited on the search path when the search fails.

3) Delete and insert the same node in binary sorting. If the node is originally a leaf node, the previous and subsequent binary trees will remain unchanged, and vice versa. For a balanced binary tree, no matter whether the original node deleted is a leaf node or not, Nodes, AVL before and after may change (or not)

structure

construction process

algorithm

delete

①If the deleted node is a leaf node, just delete it directly

②If the deleted node z has only one left subtree or right subtree, make its subtree become the subtree of z’s parent node

③If z has two subtrees, left and right, let its direct successor (calculated according to the in-order sequence) or direct predecessor replace z, and then delete the successor node to convert to the previous situation

efficiency analysis

1) The height difference between the left and right subtrees does not exceed 1, that is, a balanced binary tree

Average search length:

2) A single tree with only right (left) child

Average search length:

Comparison with binary search

1) The search performance of a binary sort tree is related to the input order of the data. Only in the best case, the search performance is the same as the binary search.

2) In terms of maintaining the orderliness of the table, a binary tree does not need to move nodes. It only needs to modify the pointer to complete the insertion and deletion operations. The time complexity is O(log2n), while binary search requires O(n); when there is When the sequence table is a static lookup table, it is appropriate to use the sequence table and binary search. When it is dynamic, it is appropriate to use a binary sorting tree as the logical structure.

balanced binary tree

definition

Define the height difference between the left and right subtrees of a node as the balance factor (left high - right high), then the balance factor of a balanced binary tree node can only be 0, 1, -1, that is, the height difference between the left and right subtrees of any node no more than 1

insert

1) Inserting nodes similar to binary sorting tree

2) Once the inserted node destroys the balance, find the smallest unbalanced subtree and rotate it.

①LL balanced rotation (right single rotation)

Insert a new node on the left subtree of the left child of node A; it needs to be rotated to the right. Rotate A's left child B upward to become the root node. A serves as the root node of B's right subtree, and B's original right child tree as the left subtree of A

②RR balanced rotation (left single rotation)

Insert a new node into the right subtree of the right child of node A; it needs to be rotated to the left. Rotate A's right child B upward to become the root node. A becomes the root node of B's left subtree. B The original left subtree of A serves as the right subtree of A

③LR balanced rotation (double rotation first left and then right)

To insert a new node into the right subtree of A's left child, you need to first rotate the root node C of the right subtree of A's left child B to the left and up to the position of B, and then rotate C to the right and up to A. s position

④RL balanced rotation (double left rotation in succession)

delete

1) Use the binary sorting tree method to perform the deletion operation on node w.

2) Starting from node w, trace upward to find the first unbalanced node z (i.e., the smallest unbalanced subtree); y is the child node with the highest height of node z; x is the height of node y The highest child node.

3) Then balance the subtree with z as the root, where there are 4 possible positions of x, y and z:

① y is the left child of z, x is the left child of y (LL, right single rotation);

②y is the left child of z, and x is the right child of y (LR, double rotation first left and then right);

③y is the right child of z, x is the right child of y (RR, left single rotation);

④y is the right child of z, x is the left child of y (RL, double rotation first right and then left)

Find

The search process is the same as that of a binary sorting tree, and the average search length is O(log2n)

Number of nodes

nh is the minimum number of nodes of a balanced binary tree with height h

red black tree

definition

A red-black tree is a binary sorted tree that satisfies the following red-black properties:

①Each node is either red or black.

②The root node is black.

③Leaf nodes (fictitious external nodes, NULL nodes) are all black.

④ There are no two adjacent red nodes (that is, the parent node and child node of the red node are both black).

⑤ For each node, the simple path from the node to any leaf node contains the same number of black nodes.

The total number of black nodes on any simple path from a node (excluding this node) to a leaf node is called the black height of the node (recorded as bh). The black height of the root node is the red-black tree. The black height

nature

1. The longest path from the root node to the leaf node is no more than twice the shortest path

2. Height of a red-black tree with n internal nodes

insert

Insert using the binary search tree insertion method and color the new node z in red

If z is the root node, color it black and end

If z is not the root node

If the parent node of z is black, no adjustment is needed

If the parent node of z is red

①The uncle node y of z is black, and z is a right child

LR (if z.p is the right child, it is RL), rotate left once, and become ②

②The uncle node y of z is black, and z is a left child

LL (or RR if z.p is the right child), rotate right once, and swap z.p and z.p.p and colors

③If the uncle node y of x is red

Color both z.p and y black, color z.p.p red, and then use z.p.p as the new node z to repeat the cycle until ①, ② or z is the root node.

delete

1) The deletion process first executes the deletion method of the binary search tree. If the node to be deleted has two children, it is exchanged with the successor node in the order (that is, the smallest node in the right subtree), and then converted to delete This successor node; the successor node has at most one child, so it is converted to the situation where the node to be deleted has no children or only one child.

2) If the node to be deleted has only a right child or a left child, it can be seen from its properties that it must be a black node and its child must be a red node.

3) If the node to be deleted has no children

①If the node is red, delete it directly

② If the node to be deleted has no children and is red, record x as the node used to replace y (x is a black null node). In order to keep the black height unchanged, x is positioned as a double black node, and it needs to be Revert to normal node

1. The sibling node w of x is red

At this time, w must have black left and right children and parent nodes; exchange the colors of w and x.p, and then rotate x.p left, converting situation 1 to 2, 3, and 4

2. The sibling node w of x is black, the left child of w is red, and the right child is black.

RL, that is, the red node is the left child of the right child of its parent node. Swap the colors of w and its left child, right-rotate w, and convert to case 3

3. The sibling node w of x is black, and the right child of w is red.

RR. Swap the colors of w and x.p, color the right child of w black, and rotate x.p left to change x back to single black, and end

4. The sibling node w of x is black, and the two child nodes of w are both black.

Remove a layer of black from both x and w (that is, w becomes red). As compensation, add an additional layer of black to x.p to maintain the local black height.

If x.p is originally red, change it to black and end

If x.p is originally black, use it as a new node x to enter the loop

hash table

basic concept

Hash function: A function that maps a keyword in a lookup table to the address corresponding to the keyword, recorded as Hash(key)=Addr (the address here can be an array subscript, index, or memory address, etc.).

The hash function may map two or more different keywords to the same address, which is called a collision. These different keywords that collide are called synonyms. On the one hand, a well-designed hash function should minimize such conflicts; on the other hand, since such conflicts are always inevitable, a method of handling them must be designed.

Hash table: A data structure that is directly accessed based on keywords. In other words, the hash table establishes a direct mapping relationship between keywords and storage addresses.

How to construct a hash function

Require

1) The domain of the hash function must include all the keywords that need to be stored, and the range of the value domain depends on the size or address range of the hash table.

2) The addresses calculated by the hash function should be evenly distributed in the entire address space with equal probability, thereby reducing the occurrence of conflicts.

3) The hash function should be as simple as possible and can calculate the hash address corresponding to any keyword in a short time.

Commonly used hash functions

direct addressing method

Basic idea

Directly take the value of a linear function of the keyword as the hash address

hash function

H(key)=key or H(key)=a×key b

Features

It is simple and does not cause conflicts, and is suitable for continuous distribution of keywords, otherwise it will waste storage space.

division leaving remainder method

Basic idea

Let the length of the hash table be m, take a prime number p that is less than and closest to or equal to m, and use the formula to convert the keyword into a hash address

hash function

H(key)=key%p

Features

The most common and simple method

digital analytics

Basic idea

For r digits of an r-base number, select a number of bits with a relatively even distribution of digits as the hash address.

Features

Suitable for known keyword sets. If the keywords are changed, a new hash function needs to be reconstructed.

Square-Medium Method

Basic idea

Take the middle digits of the squared value of the keyword as the hash address

Features

The values of each bit used only for keywords are not uniform enough or are smaller than the number of bits required for the hash address.

How to handle conflicts

open addressing method

definition

The free address where a new entry can be stored is open to both its synonym entries and its non-synonym entries.

recursion formula

Hi represents the hash address obtained by the i-th detection in the conflict processing, m represents the length of the hash table, and di is the increment sequence.

incremental sequence

Linear detection method

Basic idea

di=0, 1,...,m-1, that is, when a conflict occurs, check the next unit in the table sequentially until a free unit is found or the entire table is searched.

Features

It is easy to cause a large number of elements to "aggregate" at adjacent hash addresses, which greatly reduces the search efficiency.

With linear probing, conflicts can occur even if they are not synonyms

square detection method

Basic idea

di=0^2, 1^2, -1^2...,k^2,-k^2, where k≤m/2, the hash table length m must be a prime number that can be expressed as 4K 3, also known as secondary detection method

Features

Avoids "stack-up" problems, but cannot detect all cells on the hash table (but can detect at least half of the cells)

double hashing

Basic idea

di=Hash2(key), that is, two hash functions are used. When the address obtained by the first hash function H(key) conflicts, the second hash function Hash2(key) is used to calculate the key. address increment

hash function

pseudorandom sequence method

di = pseudo-random number sequence

Zipper method (chaining)

Store all synonyms in a linear linked list that is uniquely identified by its hash address; suitable for situations where insertions and deletions are frequent

Hash lookup and performance analysis

Search process

① Check whether there is a record at the address of Addr in the lookup table. If there is no record, return the search failure; if there is a record, compare it with the value of the key. If they are equal, return the search success flag, otherwise proceed to step ②.

② Use the given conflict handling method to calculate the "next hash address", set Addr to this address, and go to step ①.

Average search length ASL=2.5

Classification

Search successful

Find the length of each known element

The number of successful comparisons = the number of conflicts 1

Search failed

The search length of each search position determined by the hash function

Find factors that affect efficiency

hash function

How to handle conflicts

Filling factor α = number of records in the table n/hash table length m

B-tree and B-tree

B-tree and its basic operations

definition

B-tree, also known as multi-way balanced search tree, the maximum number of children of all its nodes is called the order of the B-tree, denoted as m. An m-order B-tree is either an empty tree or a tree that satisfies the following properties m-ary tree:

1) Each node in the tree has at most m subtrees, that is, it contains at most m- 1 keywords

2) If the root node is not a terminal node, there are at least two subtrees

3) All non-leaf nodes except the root node have at least "m/2] subtrees, that is, they contain at least "m/2]-1 keywords.

4) The structure of all non-leaf nodes is as follows:

Ki is the keyword of the node, sorted in ascending order

Pi is a pointer to the root node of the subtree. The keys of all nodes in the subtree pointed to by Pi-1 are less than Ki.

n is the number of keywords

5) All leaf nodes appear at the same level and carry no information (i.e. external nodes, pointers to these nodes are empty)

Properties (assuming m=5)

1) The number of children of a node is equal to the number of keywords in the node plus 1.

2) If the root node has no keywords, there will be no subtrees, and the B-tree is empty at this time; if the root node has keywords, its subtrees must be greater than or equal to two, because the number of subtrees is equal to the number of keywords plus 1.

3) All non-terminal nodes except the root node have at least "m/2]-1=3 subtrees and at most 5 subtrees (that is, at most 4 keywords).

4) The keywords in the node are in increasing order from left to right. There are pointers to the subtree on both sides of the keyword. All the keywords of the subtree pointed to by the left pointer are smaller than the keyword. The subtree pointed by the right pointer is smaller than the keyword. All keywords are greater than this keyword.

5) All leaf nodes are on the 4th level, representing the location where the search failed.

The height of the B-tree (number of disk accesses)

1) Each node of the B-tree has at most m subtrees and m-1 keywords. Therefore, the number of keywords in an m-order B-tree with a height of h should satisfy n≤(m-1)(1 m m^2 …… m^(h-1))=m^h-1

2) The root node has at least 2 subtrees, the non-root node has at least "m/2]-1 keywords, and the h 1st level has at least 2("m/2])^(h-1) nodes points, and the number of leaf nodes is n 1

B-tree search

①Find nodes in B-tree

Performed on the disk, find the corresponding node based on keyword comparison

If a leaf node is found (the corresponding pointer is a null pointer), it means that there is no corresponding keyword in the tree and the search fails.

②Find keywords in nodes

Performed in memory, using sequential or binary search methods within the node

Insertion into B-tree

1) If the number of node keywords after insertion is less than m, you can insert it directly

2) Otherwise, take the node at the middle position ("m/2]) and insert it into the parent node of the original node, and the original node will be split into two pieces; if this causes the number of keywords of the parent node to overflow, continue Split until it reaches the root node, which in turn causes the height of the B-tree to be increased by 1

Deletion of B-tree

1) If the deleted keyword k is not in the terminal node (the lowest level non-leaf node), replace k with k's predecessor (or successor) k', and then delete k' in the corresponding node (k' must be falls in a terminal node)

2) If the deleted keyword k is in the terminal node

①Delete keywords directly. The condition is that the number of keywords in the node is ≥ "m/2]. After deleting the keyword, it still meets the definition of B-tree.

②Brothers can borrow enough. If the number of keywords of the node = "m/2]-1, and the number of keywords of the sibling node ≥ "m/2], then adjust the node, right (left) sibling node and its parents Node (father-son transposition method) to achieve a new balance

③There are not enough brothers to borrow. If the number of keywords of the node and its left and right sibling nodes are both "m/2]-1, delete the keywords and merge them with the keywords of the left (right) sibling node and parent node.

Basic concepts of B-trees

B-tree is a deformed tree of B-tree that meets the following conditions:

1) Each branch node has at most m subtrees (child nodes).

2) The non-leaf root node has at least two subtrees, and each other branch node has at least "m/2] subtrees.

3) The number of subtrees of a node is equal to the number of keywords.

4) All leaf nodes contain all keywords and pointers to corresponding records. The keywords are arranged in size order in the leaf nodes, and adjacent leaf nodes are linked to each other in size order.

5) All branch nodes (indexes that can be regarded as indexes) only contain the maximum value of the keyword in each of its child nodes (that is, the index block of the next level) and the pointers to its child nodes.

The difference between B-tree and B-tree:

1) In the B-tree, a node with n keywords contains only n subtrees, that is, each keyword corresponds to one subtree; while in the B-tree, a node with n keywords contains n 1 subtrees .

2) In the B tree, the range of the number n of keywords for each node (non-root internal node) is "m/2]<n<m (root node: 2<n<m); in B In the tree, the range of the number n of keywords for each node (non-root internal node) is "m/2]-1<n<m-1 (root node: 1≤n≤m-1).

3) In the B-tree, leaf nodes contain information, and all non-leaf nodes only serve as indexes. Each index item in a non-leaf node only contains the maximum keyword of the corresponding subtree and a pointer to the subtree. The storage address of the record corresponding to this keyword is not included.

4) In the B-tree, leaf nodes contain all keywords, that is, keywords that appear in non-leaf nodes will also appear in leaf nodes; in B-tree, leaf nodes (the outermost inner node) and the keywords contained in other nodes are not repeated.

Find

① There are two head pointers in the B tree: one points to the root node, and the other points to the leaf node with the smallest keyword; therefore, two search operations can be performed on the B tree: one is a sequential search starting from the smallest keyword, and the other is The first is a multi-way search starting from the root node

②When searching in the B-tree, no matter whether it is successful or not, each search is a path from the root node to the leaf node.