Features
Features:

Product Tour >

Edraw AI >

Paid Plans:

Individuals >

Business >

Eduaction >
Resources
Blog

History

How-tos & Tips

Discovery

Biography

Business Analysis

Examples

AI concept Map

Free AI Mind Map Generator

Onenote Mind Map

Bcg Matrix Examples

Nike Marketing Strategy

Unilever SWOT Analysis

Make Mind Maps in Google Docs

Guide

FAQs

What's New

Resource Center
Templates
All Templates

Brain Storming Templates

Strategy and Planning Templates

Project Management Templates

Product Management Templates

Human Resources Templates

Agile Workflow Templates

Marketing Templates

Education Templates

Fun and Games Templates

User Gallery
Download
Pricing
Enterprise

MindMap Gallery Redis data structure knowledge framework learning

Redis data structure knowledge framework learning

Because there are many redis data types, and different data types have the same metadata to record, redis will use a RedisObject structure to uniformly record these metadata. When saving the Long type, the RedisObject pointer is directly assigned to an integer.

Edited at 2022-11-14 13:53:00

PlotWizard

Recent works View more works>>

Redis data structure knowledge framework learning

PlotWizard

Recent works View more works>>

Recommended to you
Outline

Redis
- 14
PlotWizard

Redis data structure knowledge framework learning

Cache consistency issues

Update the database first, then delete the cache

(1) The cache just expired (2) Request A to query the database and get an old value (3) Request B to write the new value to the database (4) Request B to delete cache (5) Request A to write the found old value into the cache

The above will cause dirty data

But only if 3 among 2 and 3 is faster than 2, B will delete the cache first, and A will write the cache with the old value.

In fact, writing to the database is much slower than reading, so avoid slow SQL.

Delete cache first and then write to database

(1) Eliminate cache first (2) Write the database again (these two steps are the same as before) (3) Sleep for 1 second and eliminate cache again

Use asynchronous double delete

Basic operations

Common operations

Check the number and length

Whether it exists, contains, appends, increases or decreases the key or a certain field under the key

Delete, delete specified location, delete and get

free theme

type of data

Key

Delete, exist, view, match by pattern (large collections will have performance issues),

Scan iterates all keys,

Sort List, set

Migrate from one redis instance to another redis instance

Set, cancel the expiration time, and get the remaining expiration time of the key

How Redis eliminates expired keys There are two ways to expire Redis keys: passive and active. When some clients try to access it, the key will be discovered and actively expired. Of course, this is not enough, because some expired keys will never be accessed. These keys should expire anyway, so a timed random test sets the expiration time of the keys. All these expired keys will be deleted from the key space. Specifically, this is what Redis does 10 times per second: Test 20 random keys for related expiration detection. Delete all expired keys. If more than 25% of keys expire, repeat step 1. This is a trivial probabilistic algorithm. The basic assumption is that our sample is the key control, and we keep repeating expiration detection until the percentage of expired keys is less than 25%, which means that at any given time At a certain time, at most 1/4 of the expired keys will be cleared.

Set

Sdiff gets elements that do not exist in the set SDiff a b is equivalent to a-b

Sdiff Store stores the difference set into a key

Find the union of several sets, which can be stored in a key

SInter takes the intersection SInterStore writes the intersection into a key

List

BLOCK operation, blocking a value from the head and tail of the queue

You can specify to get values from several queues

When the queue is empty, the client will be blocked until there is a write. You can specify a timeout and return nil after timeout.

Blocking operations ensure fairness

Redis 2.4 and 2.6 are different. When multiple values are written at the same time, such as a, b, c, the client whose key is blocked will receive a in 2.4 and c in 2.6.

After deleting the value from the server, if the client hangs up before the value is processed, the value will be lost. RabbitMQ's message confirmation mechanism can solve this problem.

You can use List's blocking operation combined with atomic operations of other data types in the same transaction to provide blocking behavior for the client.

Is it possible to achieve fair distributed locks?

Similar to the operations provided by List in Java, LTrim is added to intercept values in a certain range.

Remove the last element from the list and put it into another list

free theme

ordered set

You can perform reduce operations on the union of several ordered sets, supporting SUM, MIN, and MAX.

Returns the ranking of a key in an ordered set

You can obtain a certain range of values in an ordered set; you can also specify the minimum and maximum scores to query the set in this range in pages.

Iterate over a collection

Transaction

watch

The WATCH command can monitor one or more keys. Once one of the keys is modified (or deleted), subsequent transactions will not be executed. Monitoring continues until the EXEC command. As long as the value is modified before exec is executed, the transaction will not be executed. This is equivalent to redis checking during exec.

Since the function of the WATCH command is only to prevent the execution of a subsequent transaction when the monitored key value is modified, it cannot guarantee that other clients will not modify the key value. Therefore, in general, we need to re-execute the EXEC execution after it fails. the entire function. Guaranteed with a loop

After executing the EXEC command, monitoring of all keys will be cancelled. If you do not want to execute the commands in the transaction, you can also use the UNWATCH command to cancel monitoring. In some business scenarios, after watch, we may not necessarily perform transaction operations.

The Watch command is an optimistic locking technology. As long as this key value is modified, subsequent valid modifications will not take effect. This method can usually be used to reduce race conditions. When exec returns an empty collection, the operation is considered fail

Exec actually executes all instructions in the transaction and clears the watch command.

The Exec command reply is an array, consistent with the execution order of the command.

Even if one/some commands in the transaction fail to execute, other commands in the transaction queue will still continue to be executed - Redis will not stop executing the commands in the transaction.

Before the transaction executes EXEC, errors may occur in enqueued commands. For example, the command may produce syntax errors (wrong number of parameters, wrong parameter names, etc. In this case, the client will stop executing the transaction

All commands executed from the beginning of the Mutil command to the exec command will be entered into the transaction queue.

Hash

Whether it exists, add, delete, get all keys, values, iterate hash collection

Increment the hash value of a key. If the key does not exist, set it to 0 incre value.

free theme

Data structure pitfalls

When the string size needs to be expanded, if it is less than 1M, it will be doubled. If it is greater than 1M, it will only be expanded by 1M each time. The maximum length is 512 M

If the self-increment value exceeds the maximum value, an error will be reported.

For container-type data structures, if the container does not exist, create it; if the element no longer exists, delete the container.

Distributed lock

Single node solution (non-red lock)

Get lock

hash to a cluster node

If it does not exist, set the lock path. Set the expiration time.

Locking is value setting client id

Set up a watchdog and update the lock regularly. This can be implemented using a delay queue.

If the lock is not acquired, consider blocking and waiting for acquisition.

Unlock

If the expiration time is not set, it will cause a deadlock problem.

a) Once a master-slave switch occurs in redis, some locks may be lost.

Compare with zk

The distributed lock implemented by Zookeeper actually has a disadvantage, that is, the performance may not be as high as the cache service. Because every time during the process of creating and releasing a lock, transient nodes must be dynamically created and destroyed to implement the lock function. Creating and deleting nodes in ZK can only be performed through the Leader server, and then the data is not shared with all Follower machines. Concurrency issues may include network jitters, and the session connection between the client and the ZK cluster is broken. The ZK cluster thinks the client is down and will delete the temporary node. At this time, other clients can obtain the distributed lock.

Redis is difficult to achieve fairness

redlock

In the distributed version of the algorithm, we assume that we have N Redis hosts. The nodes are completely independent, so we don't use replication or any other implicit coordination system. We've described how to safely acquire and release locks within a single instance. We take it for granted that the algorithm will use this method to acquire and release locks in a single instance. In our example, we set N = 5, which is a reasonable value, so we need to run 5 Redis master servers on different computers or virtual machines to ensure that they fail in a mostly independent manner. To acquire the lock, the client performs the following operations: 1. It gets the current time in milliseconds. 2. It attempts to acquire the lock in all N instances in order, using the same key name and random value in all instances. During step 2, when the lock is set in each instance, the client uses a small timeout compared to the total lock autorelease time to acquire it. For example, if the autorelease time is 10 seconds, the timeout can be in the ~5-50ms range. This prevents clients from remaining blocked for long periods of time, trying and failing to talk to a Redis node: if an instance is unavailable, we should try to talk to the next instance as quickly as possible. 3. The client calculates the time required to acquire the lock by subtracting the timestamp obtained in step 1 from the current time. A lock is considered acquired if and only if the client is able to acquire the lock in a majority of instances (at least 3) and the total time elapsed to acquire the lock is less than the lock validity time. 4. If a lock is acquired, its validity time is considered to be the initial validity time minus the elapsed time, as calculated in step 3. 5. If the client fails to acquire the lock for some reason (unable to lock N/2 1 instances or negative validity time), it will try to unlock all instances (even if it thinks it is not able to lock).

When a node is hung up, there is no risk of the lock being preempted.

cache

Invalid

Set a random value for the failure time to avoid collective failure

Data structure implementation

Application scenarios

zset

Record user post id list. Quick display

Record hot list post ID list. Overall hot list, category hot list

Used to record one-to-many, and the party with many cannot be repeated strictly.

Record a certain collection, which may be related to the user or the overall system.

hash

Record the number of likes, comments, and clicks on a post.

Record the post title, abstract, author, cover information, etc.

Cache recent hot post content. (The post content is very large and not suitable for fetching from db)

list

Record the list of related articles of the post. Recommend related posts based on the content.

bitmap

Used as a bool array or a custom bit array, mainly to save memory

HyperLogLog

Used to roughly calculate statistical values after deduplication.

For example, website UV needs to filter repeated visits

You can only add and get the total number, but you cannot get whether a certain value exists or all elements.

bloom filter

Used to remove duplicates. Get whether an element exists in the list

If it exists, it doesn’t necessarily really exist If it doesn't exist, it definitely doesn't exist

Consider recommending unvisited resources to users

Consider not reusing resources multiple times.

A certain failure rate must be tolerated, that is, the resource may not be accessed, but it is regarded as having been accessed.

Use multiple Hash algorithms and set the bits of the key mapping to 1.

If it doesn’t exist, it definitely doesn’t exist!!!!

There are only add and exist (multiple can be checked) commands

You can use initial parameters to set the filter. error_rate is the error rate. The lower the value, the greater the capacity. initial_size is the estimated total amount. The total amount should be set according to the actual scenario.

Redis-cell

If the project is not large and the maintenance cost is not high, you can use redsi-cell directly. Otherwise, you can consider fine-grained control to each service node to limit the flow, and implement it with the corresponding load balancing strategy.

Use zset and score to circle the time window and count the number of times the same behavior occurs for the same user within the time window. If the number of current limits is too large, it may not be applicable, such as 1,000 times per second.

CL.THROTTLE test 100 400 60 3

Test key capacity 100 (maximum concurrency) up to 400 times within 60 seconds. This time, 3 capacities are applied for

1: Whether it is successful, 0: Success, 1: Reject 2: The capacity of the token bucket, the size is the initial value 1 3: Tokens available in the current token bucket 4: If the request is rejected, this value indicates how long it will take before the token is re-added to the funnel bucket. Unit: seconds, which can be used as the retry time. 5: Indicates how long it will take for the tokens in the token bucket to be full.

funnel algorithm

special data structure

Compressed linked list

It is composed of compressed linked lists or arrays, in order to avoid the additional memory overhead of ordinary linked lists.

A doubly linked list designed to save memory as much as possible

Store string or integer

Save memory in details, using variable length encoding for value storage

Each item will have a separate number of digits to mark the data length and type of the item.

Implementation of underlying data structures of hash, list, and zset

Features

1) The internal representation is a continuous memory array in which data is compactly arranged.

2) You can simulate a doubly linked list structure and enter and dequeue with O(1) time complexity.

3) The new deletion operation involves memory reallocation or release, which increases the complexity of the operation.

4) Read and write operations involve complex pointer movements, and the worst-case time complexity is O(n2).

5) Suitable for storing small objects and data of limited length.

1) When using ziplist for scenarios with high performance requirements, it is recommended that the length should not exceed 1000 and the size of each element should be controlled within 512 bytes.

jump table

The skip list supports O(logN) average search and worst-case O(N) complexity search.

Why not use trees or balanced trees instead of jump tables?

The implementation of jump table is very simple and can reach O(logN) level.

fast linked list

The compressed linked list is linked using pointers to become a linkedlist

The default value of each ziplist is 8k. (Configurable)

Operation and maintenance

monitor

2) Use the info Commandstats command to obtain the average command time, including the number of calls for each command. , the total time taken, the average time taken, in microseconds.

Master-slave

The replication function of Redis is divided into two operations: synchronization (sync) and command propagate (command propagate):

Synchronization operations are used to retrieve the database state from the server Update to the current database state of the main server;

The command propagation operation is used to modify the database state on the main server. When the database status of the master-slave server is inconsistent, let the master-slave server The server's database is returned to a consistent state.

Read and write separation

The separation of read and write is suitable for large access (so large that a single redis machine feels very slow), and the write operation is much smaller than the read operation.

If the number of read requests far exceeds the write requests, the cluster data copy cost will be much less than the read request cost. At the same time, if we can accept data inconsistency to a certain extent, we can separate reading and writing.

redis cluster

characteristic

1. All redis nodes are interconnected with each other (PING-PONG mechanism), and a binary protocol is used internally to optimize transmission speed and bandwidth.

2. The failure of a node takes effect only when more than half of the nodes in the cluster detect failures.

3. The client is directly connected to the redis node, without the need for an intermediate proxy layer. The client does not need to connect to all nodes in the cluster, just connect to any available node in the cluster.

4. redis-cluster maps all physical nodes to [0-16383] slot (not necessarily evenly distributed), and cluster is responsible for maintaining node<->slot<->value.

Each Redis node needs to execute a command and declare the slot it is responsible for.

cluster addslots {slot_index1} {slot_index 2} {slot_index 3}

5. The Redis cluster is pre-divided into 16384 buckets. When a key-value needs to be placed in the Redis cluster, it is decided according to the value of CRC16(key) mod 16384 which bucket a key should be placed in.

Each Redis instance is aware of the existence of other nodes

Strong consistency cannot be guaranteed

1. Your client writes to the main server node B 2. The main server node B replies to your client to confirm. 3. Master server node B propagates the write to its slave servers B1, B2 and B3.

If after step 2, no data is sent from the slave server and B hangs up, the key will be lost (the key will definitely be lost during the failure)

fault tolerance

The election process involves the participation of all masters in the cluster. If more than half of the master nodes communicate with the failed node for more than (cluster-node-timeout), the node is considered to be faulty and the failover operation is automatically triggered.

(2): When does the entire cluster become unavailable (cluster_state:fail)? a: If any master in the cluster dies, and the current master has no slave. The cluster enters the fail state, which can also be understood as entering the fail state when the cluster's slot mapping [0-16383] is not completed. b: If more than half of the masters in the cluster die, no matter whether there is a slave cluster entering the fail state.

When the cluster is unavailable, all operations on the cluster are unavailable and an ((error) CLUSTERDOWN The cluster is down) error is received.

failover

1. All slave nodes of the offline master node will be elected to elect a new master node. 2. The selected slave node will execute the slave no one command and become the new master node. 3. The new master node will revoke all slot assignments to the offline master node and assign these slots to itself. 4. The new master node broadcasts a pong message to the cluster. This pong message allows other nodes in the cluster to immediately know that the node has changed from a slave node to a master node, and that the master node has taken over the server that was originally offline. The slot that the node handles. 5. The new master node begins to accept command requests related to the slot it is responsible for processing, and the failover operation is completed.

master-slave election

1. When the slave node finds that the master node it replicates has gone offline, the slave node (there may be multiple slave nodes making requests here) will broadcast a cluster_type_failover_auth_request message to the cluster, requiring voting rights (responsible for processing slots) The master node votes to this node. 2. The master node that receives the cluster_type_failover_auth_request message will judge whether it agrees with the slave node to become the new master node based on its own conditions (the current epoch of the initiating voting node is not lower than the current epoch of the voting node). If it agrees, it will return a cluster_type_failover_auth_ack message. 3. When the cluster_type_failover_auth_ack message is received from the node, the number of votes will be increased by 1. 4. If the votes of a slave node are greater than or equal to half of the master nodes in the cluster (greater than or equal to N/2 1), this node will become the new master node. If no slave node receives enough votes during a configuration cycle, the cluster will enter a new configuration cycle, and elections will be held here until a new master node is elected.

All slave nodes may seek opinions on whether they can become the master (those who vote will only vote for their own larger nodes), and more than half of them can be (n 1)/2

Maybe I can't choose

limitations

1. Currently only batch operations of keys on the same slot are supported; 2. Currently only key transactions on the same slot are supported; 3. Only database 0 can be used (each redis instance has 16 databases, which can be switched through the select {index} command); 4. A large key (such as hash, list) cannot be mapped to different nodes; 5. Currently, cluster master-slave replication only supports one level and does not support nested tree structure;

When expanding

step

1. Send to the target node cluster setslot {slot_index} importing {source_node_id} 2. Send to the source node cluster setslot {slot_index} migrating {target_node_id} 3. Source node loop execution cluster getkeysinslot {slot_index} {count(number of keys)} 4. The source node executes and migrates the key to the target node through the pipeline. migrate {target_ip} {target_port} "" 0 {timeout} keys {key1} {key2} {key3} 5. Repeat steps 3 and 4 6. Send notifications to all master nodes in the cluster cluster setslot {slot_index} node {target_nodeid}

Each node knows the cluster node corresponding to each slot.

When the node receives the command request, it queries whether it can handle it by itself. If so, it handles it. If not, it returns a move error. The move error carries the correct node IP and port number and returns it to the client to guide it to execute. And every subsequent operation of the client Once the key is executed, it will go to the node provided by the moved error.

Codis

Access layer: The access method can be VIP or call jodis through java code, and then connect and call different codis-proxy addresses to achieve high-availability LVS and HA functions.

Proxy layer: Then the middle layer uses codis-proxy and zookeeper to process the data direction and distribution. Through the crc32 algorithm, the keys are evenly distributed in a certain slot of different redis. To achieve striping similar to raid0, in the old version of codis, Slots need to be allocated manually. After codis3.2, slots will be allocated automatically, which is quite convenient.

Data layer: Finally, codis-proxy stores the data on the real redis-server main server. Since the author of codis, Huang Dongxu, attaches great importance to data consistency and does not allow data inconsistencies caused by data delays, the architecture was not considered from the beginning. Master-slave reading and writing are separated. The slave server is only used as a redundant architecture for failover, and zookeeper calls redis-sentinel to implement the failover function.

In Codis, Codis will divide all keys into 1024 slots. These 1024 slots correspond to the Redis cluster. In Codis, the mapping relationship between these 1024 slots and the Redis instance will be maintained in memory. This slot is configurable and can be set to 2048 or 4096. It depends on how many nodes your Redis has. If it is too many, you can set more slots.

When the Codis Dashbord of Codis changes the slot information, other Codis nodes will monitor the ZooKeeper slot changes and synchronize them in time. As shown in the picture:

zk is responsible for synchronizing slot information.

priority queue

sorted set

list

Use multiple queues to implement priority queues. Different priority tasks enter different queues

At the same time, when consumers fetch data from the queue, they support fetching data from multiple queues, with priority order.

blocked

Starting multiple consumers means starting multiple clients to retrieve data.

message queue

pubsub

use

Subscribers can subscribe to one, multiple, matching subscription topics

The publisher publishes a certain topic, and value

Published topics will be immediately forwarded to consumers who subscribe to the topic. If there are no consumers, the message will be discarded.

risk

There will be a risk of message loss (when the machine is down, the network is disconnected, or the network is interrupted and messages are lost)

Due to this feature, simple pubsub runs the risk of losing responses.

Data reliability cannot be guaranteed

There is no guarantee that at least once

The expansion is not flexible and there is no way to speed up consumption by adding more consumers.

You can use multiple channels and listen multiple times.

list

1. When there is no element in the given list to be popped, the connection will be blocked by the BRPOP command until the wait times out or a popable element is found. 2. When multiple key parameters are given, each list is checked in sequence according to the parameter key, and the tail element of the first non-empty list pops up. In addition, BRPOP behaves the same as BLPOP except for the position of the pop-up element.

If there are no tasks in the list, the connection will be blocked There is a timeout for connection blocking. When the timeout is set to 0, you can wait wirelessly until a message pops up.

Using pubsub, you can also notify consumers that they can consume from the list.

It is suitable for subscription and publishing between two businesses between A and B. It is more difficult when multiple business lines mean different consumers.

ack implementation

Maintain two queues: pending queue and doing table (hash table).

workers are defined as ThreadPool. After being queued by the pending queue, workers allocate a thread (single worker) to process the message - append a current timestamp and current thread name to the target message, write it to the doing table, and then the worker consumes the message. After completion Erase information in the doing table by yourself.

Enable a scheduled task, scan the doing queue at regular intervals, and check the timestamp of every element. If it times out, the worker's ThreadPoolExecutor will check whether the thread exists. If it exists, the current task execution will be canceled and the transaction will be rolled back. Finally, pop the task out of the doing queue and push it back into the pending queue.

You can use zset for sorting.

Avoid overuse of Redis. Only use Redis to do the things you are best at, and do the things you are not good at. The more you do, the more you will discover. The more pitfalls there are, the harder it is to give up in the end. Wrong design in the early stage leads to high failure rate in the later stage. Poor stability and high transformation costs. etc. rabbitmq is not very complicated. The operation and maintenance is also very simple and can be mixed with the business system.