MindMap Gallery Alibaba Cloud Tablestore Tablestore
Tablestore provides serverless table storage services for massive structured data and provides a one-stop IoTstore solution for in-depth optimization of IoT scenarios. It is suitable for structured data storage in scenarios such as massive bills, IM messages, Internet of Things, Internet of Vehicles, risk control, recommendations, etc. It provides low-cost storage of massive data, millisecond-level online data query and retrieval, and flexible data analysis capabilities.
Edited at 2024-01-12 17:44:04One Hundred Years of Solitude is the masterpiece of Gabriel Garcia Marquez. Reading this book begins with making sense of the characters' relationships, which are centered on the Buendía family and tells the story of the family's prosperity and decline, internal relationships and political struggles, self-mixing and rebirth over the course of a hundred years.
One Hundred Years of Solitude is the masterpiece of Gabriel Garcia Marquez. Reading this book begins with making sense of the characters' relationships, which are centered on the Buendía family and tells the story of the family's prosperity and decline, internal relationships and political struggles, self-mixing and rebirth over the course of a hundred years.
Project management is the process of applying specialized knowledge, skills, tools, and methods to project activities so that the project can achieve or exceed the set needs and expectations within the constraints of limited resources. This diagram provides a comprehensive overview of the 8 components of the project management process and can be used as a generic template for direct application.
One Hundred Years of Solitude is the masterpiece of Gabriel Garcia Marquez. Reading this book begins with making sense of the characters' relationships, which are centered on the Buendía family and tells the story of the family's prosperity and decline, internal relationships and political struggles, self-mixing and rebirth over the course of a hundred years.
One Hundred Years of Solitude is the masterpiece of Gabriel Garcia Marquez. Reading this book begins with making sense of the characters' relationships, which are centered on the Buendía family and tells the story of the family's prosperity and decline, internal relationships and political struggles, self-mixing and rebirth over the course of a hundred years.
Project management is the process of applying specialized knowledge, skills, tools, and methods to project activities so that the project can achieve or exceed the set needs and expectations within the constraints of limited resources. This diagram provides a comprehensive overview of the 8 components of the project management process and can be used as a generic template for direct application.
Alibaba Cloud Tablestore Tablestore
Product introduction
Tablestore provides Serverless table storage services for massive structured data, and provides a one-stop IoTstore solution for in-depth optimization of IoT scenarios. It is suitable for structured data storage in scenarios such as massive bills, IM messages, Internet of Things, Internet of Vehicles, risk control, recommendations, etc. It provides low-cost storage of massive data, millisecond-level online data query and retrieval, and flexible data analysis capabilities.
basic concept
the term illustrate area Region (Region) physical data center, table storage service will be deployed in multiple Alibaba Cloud regions. You can choose table storage services in different regions according to your business needs. For more information, see Regions in which Table Storage has been activated. Read and write throughput The units of read throughput and write throughput are read service capability unit and write service capability unit. Service capability unit (Capacity Unit, referred to as CU) is the minimum billing unit for data read and write operations. For more information, see Read and Write Throughput. Example Instance is an entity that uses and manages table storage services. Each instance is equivalent to a database. Table Storage's application access control and resource metering are completed at the instance level. See examples for more information. Service address Each instance corresponds to a service address (EndPoint), and the application needs to specify the service address when performing table and data operations. See Service Address for more information. Data life cycle Data life cycle (Time To Live, TTL for short) is an attribute of the data table, that is, the survival time of the data, in seconds. Table Storage will clean up data that has exceeded the survival time in the background to reduce your data storage space and storage costs.
Data storage model
Model describe wide table model It is a Bigtable/HBase-like model that can be applied to various scenarios such as metadata and big data. It supports data version, life cycle, primary key column auto-increment, conditional update, local transactions, atomic counters, filters and other functions. For more information, see Wide Table Model. Timing model Models designed based on the characteristics of time series data can be applied to IoT device monitoring, device data collection, machine monitoring data and other scenarios, and support functions such as automatic construction of time series metadata indexes and rich time series query capabilities. For more information, see Timing Models. message model The model designed for message data scenarios can be applied to message scenarios such as IM and Feed flow. It can meet the requirements of message scenarios for message order preservation, massive message storage, and real-time synchronization. It also supports full-text retrieval and multi-dimensional combination query. For more information, see Message Model.
Calculation and analysis
Supports calculation and analysis through MaxCompute, Spark, Hive or HadoopMR, Function Compute, Flink, and Table Storage SQL queries. Please select the corresponding analysis tools according to the actual scenario.
analyzing tool Applicable model operate describe MaxCompute wide table model Using MaxCompute Create an external table for the table storage data table through the MaxCompute client to access the data in table storage. Spark wide table model Using Spark computing engine When using the Spark computing engine, it supports programmatic access to table storage through E-MapReduce SQL or DataFrame. Hive or HadoopMR wide table model Use Hive or HadoopMR Use Hive or HadoopMR to access data in table storage. function calculation wide table model Calculate using functions Access table storage through function computing and perform real-time calculations on table storage incremental data. Flink wide table model Timing model Using Flink Flink accesses source tables, dimension tables or result tables in table storage through real-time calculations to achieve real-time calculation and analysis of big data. Currently, data tables can be used as source tables, dimension tables or result tables, while time series tables can only be used as result tables. Presto wide table model Using Tablestore with PrestoDB After using Presto to connect to Tablestore, you can use SQL to query and analyze data in Tablestore, write data to Tablestore, and import data to Tablestore based on Presto on Tablestore. Table storage multiple index wide table model Multiple index Multivariate index is based on inverted index and column storage, which can solve the problems of multi-dimensional query and statistical analysis of big data. When there are multi-dimensional query requirements such as non-primary key column queries, multi-column combination queries, and fuzzy queries in daily business, as well as data analysis requirements such as finding the maximum value, counting rows, and data grouping, you can use these attributes as fields in the multivariate index and Query and analyze data using multivariate indexes. Table storage SQL query wide table model Timing model Use SQL query SQL query provides a unified access interface for multiple data engines. Through the SQL query function, you can perform complex queries and efficient analysis of data in table storage.
Features
wide table model
Please see the following table for the functional features supported by the wide table model.
Features describe Related documents Table operations Supports listing all data tables in the instance, creating a data table, querying the configuration information of the data table, updating the configuration information of the data table, and deleting a data table. Table operations Basic data operations Table Storage provides single-row data operation interfaces of PutRow, GetRow, UpdateRow, and DeleteRow, and multi-row data operation interfaces of BatchWriteRow, BatchGetRow, and GetRange. You can read and write data in the table through a single-row data operation interface or a multi-row data operation interface. data input Read data delete data Data versions and lifecycle Using data versions and data lifecycle (TTL) functions, you can effectively manage data, reduce data storage space, and reduce storage costs. Data versions and lifecycle Primary key column auto-increment After setting the primary key column of the non-partition key to an auto-increment column, there is no need to set a specific value for the auto-increment column when writing data. Table Storage will automatically generate the value of the auto-increment column. The value is unique and strictly increasing at the partition key level. Primary key column auto-increment condition update Only when the conditions are met, the data in the data table can be updated; when the conditions are not met, the update fails. condition update local affairs Create a local transaction whose data range is within a partition key value. After reading and writing the data in the local transaction, the local transaction can be submitted or discarded according to the actual situation. local affairs Atomic counter Use the column as an atomic counter and perform atomic counting operations on the column, which can be used to provide real-time statistical functions for some online applications, such as counting PV (real-time views) of posts, etc. Atomic counter filter The read results are filtered again on the server side, and which rows are returned based on the conditions in the filter. Since only data rows that meet the conditions are returned, in most scenarios, the amount of data transmitted over the network can be effectively reduced and the response time reduced. filter secondary index By creating one or more index tables and using the primary key columns of the index tables to query, it is equivalent to extending the primary key query capabilities of the data table to different columns. Secondary indexes include global secondary indexes and local secondary indexes. Global secondary index: Automatically synchronizes the data of indexed columns and primary key columns in the data table to the index table in an asynchronous manner. Under normal circumstances, the synchronization delay reaches the millisecond level. Local secondary index: Automatically synchronize the data of the indexed columns and primary key columns in the data table to the index table in a synchronous manner. After the data is written to the data table, the data can be queried from the index table. secondary index Global secondary index local secondary index Multiple index Multivariate index is based on inverted index and column storage, which can solve complex query problems of big data, including non-primary key column query, full-text search, prefix query, fuzzy query, multi-condition combination query, nested query, geographical location query, and statistical aggregation (max, min, count, sum, avg, distinct_count, group_by), concurrent export of data and other functions. Multiple index Use the console Use command line tools Use SDK SQL query The SQL query function provides a unified access interface for multiple data engines. Through the SQL query function, you can perform complex queries and efficient analysis of data in table storage. When using SQL to query data, you can also use indexes to optimize the query. SQL query Use the console Use SDK Using JDBC JDBC connection table storage Used via Hibernate Used via MyBatis Using Go language driver channel service Table Storage provides three types of distributed data real-time consumption channels: incremental, full, and incremental plus full, which can realize the consumption processing of historical stock and new data in the table. channel service Quick start Use SDK Data Security Table Storage allows access from any network by default. You can bind VPC to an instance and change the instance network type to use Table Storage resources in a private network to ensure network access security. In order to ensure the security of table data, Table Storage provides data disk encryption function. You can configure data table encryption when you create the data table. Network security management data encryption Data Lake Delivery Table Storage Data Lake Delivery can fully back up or deliver data in real time to the data lake OSS for storage to meet lower-cost historical data storage and larger-scale offline and quasi-real-time data analysis needs. Data Lake Delivery Quick start Use SDK data visualization Supports docking with data visualization tools DataV or Grafana. Data in table storage can be visually displayed by docking with data visualization tools. Connect with Grafana Connect to DataV Monitoring and alarming By viewing the monitoring information of Table Storage resources, you can understand the resource usage. By setting alarm rules for important monitoring indicators of resources, you can also learn about indicator anomalies in a timely manner and handle them quickly. View monitoring data through the Table Storage console Configure monitoring indicator alarms Backup and recovery Regularly back up the data in the table storage instance through hybrid cloud backup HBR (Hybrid Backup Recovery), and restore it in a timely manner when data is lost or damaged. HBR supports full and incremental data backup, and also supports data redundancy mechanism, which can improve the data reliability of the repository. Back up Tablestore data Restore Tablestore data HBase support Java applications of the open source HBase API can directly access the table storage service through the Tablestore HBase Client. HBase support Quick start
Timing model
Please refer to the following table for the functional features supported by the timing model.
Features describe Related documents Table operations Supports listing all time series tables in the instance, creating a time series table, querying the configuration information of the time series table, updating the configuration information of the time series table, and deleting a time series table. Use the console Use command line tools Use SDK Read and write timing data Write time series data in batches into a time series table. After the data is written, you can query the data of a timeline within a certain time range by specifying the timeline ID. Timeline search Search the timeline in a time series table. The search conditions support multiple combinations of conditions. After retrieving the timeline, you can further query the data in the timeline by calling the interface. SQL query analysis The time series table supports querying through SQL. SQL supports filtering timelines by specifying the metadata conditions of the timeline and aggregating data according to different dimensions through statistical aggregation operations. In addition, SQL also supports querying only the metadata of the timeline, which facilitates metadata management of the timeline through SQL. Query time series data using SQL Connect with Grafana After the table data stored in Table is connected to Grafana, Grafana can generate a dashboard based on the table data and display the data to the users who need it in real time. Connect with Grafana
message model
Please see the following table for the functional features supported by the message model.
Features describe Related documents Table operations Create or delete Meta tables and their indexes. Create or delete the Timeline table and its indexes. Table operations Meta management Meta management provides interfaces such as addition, deletion, modification, single-row reading, and multi-condition combination query. Meta management Timeline management Timeline management provides message fuzzy query and multi-condition combined query interfaces. Timeline management Queue management Queue is a management instance of the message queue corresponding to the single Identifier placed in a single repository. It mainly has interfaces such as synchronous write, asynchronous write, batch write, delete, synchronous change, asynchronous change, single row reading, and range reading. Queue management
product architecture
system structure
The architecture of table storage is shown in the figure below.
Business scene
It is suitable for system construction in scenarios such as metadata, message data, spatiotemporal data, and big data.
data access
Provides multiple data access methods such as SDK, DataWorks, and IoT rule engines to support the storage of structured data of different business types such as application data, message data, and IoT data.
Tablestore
Multi-model data storage
Three data storage models are provided for structured data of different business types: wide table (WideColumn) model, time series (TimeSeries) model, and message (Timeline) model.
Model describe wide table model It is a Bigtable/HBase-like model that can be applied to various scenarios such as metadata and big data. It supports data version, life cycle, primary key column auto-increment, conditional update, local transactions, atomic counters, filters and other functions. For more information, see Wide Table Model. Timing model Models designed based on the characteristics of time series data can be applied to IoT device monitoring, device data collection, machine monitoring data and other scenarios, and support functions such as automatic construction of time series metadata indexes and rich time series query capabilities. For more information, see Timing Models. message model The model designed for message data scenarios can be applied to message scenarios such as IM and Feed flow. It can meet the requirements of message scenarios for message order preservation, massive message storage, and real-time synchronization. It also supports full-text retrieval and multi-dimensional combination query. For more information, see Message Model.
Diverse data index
Table storage also supports secondary index and multi-index indexing methods, providing powerful data query capabilities.
Index type describe Data table primary key The data table is similar to a huge Map, and its query capability is similar to that of a Map, which can only be queried through the primary key. secondary index By creating one or more index tables and using the primary key columns of the index tables to query, it is equivalent to extending the primary key query capabilities of the data table to different columns. Multiple index It uses inverted index, BKD tree, column storage and other structures, and has rich query capabilities, such as conditional query on non-primary key columns, multi-condition combined query, geographical location query, full-text search, fuzzy query, nested structure query, and statistical aggregation. wait.
Hot and cold tiered storage
Data storage supports automatic hot and cold tiering, and table storage supports two instance specifications: high-performance instances and capacity instances to meet the data storage needs of different businesses.
Instance specifications describe High performance instance It is suitable for scenarios that require very high reading and writing performance and concurrency, such as games, financial risk control, social applications, recommendation systems, etc. Capacity instance It is suitable for businesses that are not sensitive to read performance but are more sensitive to cost, such as log monitoring data, Internet of Vehicles data, equipment data, time series data, logistics data, public opinion monitoring, etc.
Data Lake Delivery
Full backup of table data or real-time delivery of data to the data lake OSS for storage. The delivered data is compatible with open source ecological standards, is stored in the Parquet column storage format, and is compatible with the Hive naming convention. You can use E-MapReduce to directly perform appearance analysis on the data delivered to OSS.
Computing ecological docking
Supports docking with mainstream open source stream batch computing engines, including Flink, Spark, Presto, etc.
It has relatively complete integration with the ecological components of Alibaba's big data platform, including DataWorks, DataHub, MaxCompute, etc.
Typical application architecture
Internet application architecture
Internet application architecture includes database hierarchical architecture and distributed structured data storage architecture, which are mainly used in scenarios such as e-commerce orders, live broadcast barrages, file metadata in network disks, and instant messaging in social networks.
Database layered architecture
In the database hierarchical architecture, use Tablestore to cooperate with MySQL to complete the business requirements of the application system, use MySQL's transaction capabilities to handle write operations and partial read operations that require strong transaction requirements, and use Tablestore's data retrieval capabilities and big data storage to achieve this. Data storage, query and analysis.
Distributed structured data storage architecture
In the distributed structured data storage architecture, Tablestore is directly connected to the application system to implement simple transaction processing and highly concurrent data reading and writing.
Data Lake Architecture
The data lake architecture is mainly used in data middle platforms, recommendation systems, risk control systems and other scenarios.
Table storage can be used as a source table, result table or dimension table to connect to the streaming batch calculation engine to implement big data calculation and analysis.
IoT architecture
The IoT architecture is mainly used in scenarios such as Internet of Vehicles, smart home appliances, industrial Internet of Things, and logistics.
Table storage serves as a unified data storage platform in IoT infrastructure to store time series data, metadata, message data, etc. related to the IoT platform, and provides rich data analysis and processing capabilities.
Product advantages
Multi-model data storage
table storage
It supports multiple data storage models such as wide table (WideColumn) model, time series (TimeSeries) model, message (Timeline), etc., and can realize integrated storage of multiple types of data.
wide table model
: Classic model. Currently, most semi-structured and structured data are stored using the wide table model.
Timing model
: Suitable for core data scenarios such as time series data and spatiotemporal data.
Diverse data index
Table storage also supports secondary index and multi-index indexing methods, providing powerful data query capabilities.
secondary index
: It is equivalent to providing another sorting method for the data table, that is, pre-designing a data distribution for the query conditions, which can speed up the efficiency of data query.
Multiple index
: Based on inverted index and columnar storage, it supports multi-field free combination query, fuzzy query, geographical location query, full-text retrieval, etc., which can solve complex query problems of big data.
Multi-computing ecological access
Supports access to the open source ecosystem and Alibaba’s self-developed ecosystem.
Supports docking with batch computing such as MaxCompute and Spark, as well as docking with Flink stream computing through real-time data channels.
access security
Provides a variety of permission management mechanisms, and performs identity authentication and authentication on every request to prevent unauthorized data access and ensure the security of data access.
Supports data access rights management, including login rights, table creation rights, read and write rights, whitelist control rights, etc.
Seamlessly expand
Seamless storage expansion is achieved through data sharding and load balancing technology. As the amount of table data continues to increase, Table Storage will adjust the data partition to configure more storage for the table. Table storage can support no less than 10 PB data storage, and a single table can support no less than 1 PB data storage or 1 trillion records.
High reliability
Store multiple backups of data on different machines in different racks and quickly restore them when backups fail, providing 99.99999999% (10 nines) reliability.
Strong data consistency
Ensure strong consistency in data writing, ensure that all three copies of data are written to disk, and all data remains consistent. Once the write operation returns successfully, the application can immediately read the latest data.
High concurrent reading and writing
Supports tens of millions of concurrent reading and writing capabilities.
Convenient operation and maintenance
With table storage, you only need to focus on business research and development, without worrying about software and hardware provisioning, configuration, faults, cluster expansion, security and other issues. While ensuring high service availability, it greatly reduces management and operation and maintenance costs.
Application scenarios
Internet application
Historical order data scenario
The order system is a very common system that exists in all walks of life, such as e-commerce orders, bank statements, operator phone bills, etc. With the development of the Internet and the emphasis on data by enterprises, the number of orders that require storage and persistence is increasing. Traditional relational data can solve online businesses that need to support strongly consistent transactions, but massive order relational data cannot save the entire amount of data, so it is necessary to support hierarchical data storage.
IM scene
IM (Instant Messaging) has become a basic component of the current Internet business and is widely used in social networking, games, live broadcasts and other scenarios. It has the characteristics of large data volume, high real-time requirements, and rapid data growth. Therefore, it needs to support massive messages. Storage, synchronization and retrieval.
Feed flow scenario
Feed flow has become a standard information transmission form in social networking, media, news and other fields, and has produced mainstream products such as Moments, Weibo, and Toutiao. Since the read-write ratio in feed stream scenarios is generally 100:1 and push mode is often used, it is necessary to support high-concurrency primary key auto-increment message writing.
Big Data
Recommended system
As the main means of refined operation of all current businesses, the recommendation system is widely used in e-commerce, short videos, news and other scenarios. It has the characteristics of large data volume, real-time updates, personalized recommendations, etc., so it needs to support massive message storage and real-time and offline analyze.
Public opinion & risk control analysis (data crawler) scenario
Through the analysis and control of public opinion information, we can effectively analyze and gain insights into the market. For example, the collection and analysis of reviews, news, comments and other information require high-concurrency writing of rich multi-type data and convenient data flow for calculation and analysis.
Internet of things
The operation and maintenance monitoring of the system and the monitoring of the environment and people in the Internet of Things (IoT) scenario are helpful for factual understanding and decision-making. Therefore, it is necessary to support high concurrent writing and data storage of many devices and systems. and decision analysis.