A Materialized View (MV) is a database object that stores the query results of a view. Unlike a regular view, an MV stores precomputed data. By saving the results of time-consuming operations such as aggregation and join, the system can directly reuse the saved results when executing queries, thus avoiding repeated execution of these time-consuming and resource-intensive operations and accelerating queries. In data warehouses and decision support systems, the use of materialized views can significantly reduce computation time and improve query efficiency.
Characteristics of materialized views
Refresh methods of materialized views
When you create a materialized view, you can choose different refresh methods to ensure that the data of the materialized view is synchronized with the base table. The refresh method you choose will directly affect the system performance and the timeliness of query results.
Complete refresh and incremental refresh
Complete refresh: Each time the system refreshes the materialized view, it re-executes the query statement and overwrites the original view results with the new calculation results. A complete refresh is suitable for scenarios with low latency requirements, infrequently updated base table data, complex query statements, or small data volumes.
Incremental refresh: Incremental refresh, also known as fast refresh, only refreshes the incremental changes. It is suitable for large-scale datasets. For incremental refresh, the system requires a materialized view log (mlog). The system imposes certain requirements on query statements for incremental refresh. The system supports query statements for single-table aggregation, multi-table join, and multi-table join aggregation. The system also imposes some requirements on aggregation functions and join methods. For more information, see Refresh materialized views in MySQL compatible mode and Refresh materialized views in Oracle compatible mode. Incremental refresh is suitable for scenarios with high latency requirements, large data volumes, and frequent changes.
Automatic refresh and manual refresh
- Automatic refresh: When you create a materialized view, you can specify the refresh interval of the materialized view. The system automatically schedules the refresh tasks of the materialized view according to the configured refresh time rules.
- Manual refresh: If the materialized view does not have automatic refresh enabled or the refresh interval is large, you can manually execute the refresh command of the materialized view to synchronize the data of the materialized view with the data of the base table.
Real-time materialized views
Real-time materialized views (RTMV) allow you to obtain real-time data by querying the materialized view. The query results are the same as those obtained by directly querying the base table. The system can use the pre-computed results of the materialized view to accelerate the query. It captures and processes changes in the underlying base table by using mlogs to ensure that the data of the materialized view can be timely updated to the latest status. To define a materialized view as an RTMV, the query statements of the materialized view must meet the requirements for incremental refresh. In other words, the mlogs must be available for the materialized view.
For more information about how to create an RTMV, see Create a materialized view in MySQL compatible mode and Create a materialized view in Oracle compatible mode.
Nested materialized views
Nested materialized views refer to a scenario where one materialized view depends on another. This approach is particularly useful during the Extract, Transform, Load (ETL) process. In an ETL workflow, nested materialized views can aggregate and transform data across different stages, storing the results as independent views. This approach avoids multiple redundant computations and enhances the overall efficiency of the ETL process. However, nested materialized views do not support automatic cascading refreshes. Before using nested materialized views, it's essential to understand the refresh considerations for nested materialized views to ensure that queries for the upper-level materialized views yield the expected results.
For more information about creating nested materialized views, see Create a materialized view in MySQL mode and Create a materialized view in Oracle mode.
Query rewrite for materialized views
If you want to leverage materialized views to speed up queries without modifying the original query statements, you can utilize the query rewrite capability of materialized views. The system can automatically match query statements with the definitions of materialized views. If a matching materialized view is found, the system automatically rewrites the query to use the materialized view. This way, you can greatly enhance query performance and efficiency without changing your business operations.
For more information about query rewrite for materialized views, see Query rewrite for materialized views in MySQL mode and Query rewrite for materialized views in Oracle mode.
Ways to speed up queries for materialized views
Materialized views are an important tool for enhancing query performance in databases. By precomputing and storing query results, they reduce the performance overhead associated with real-time calculations. To further optimize the query performance of materialized views, you can consider the following methods:
Storage formats for materialized view data
You can choose between rowstore and columnstore materialized views based on your actual application scenarios. This can help speed up queries for these views.
- Rowstore materialized views: When the data in the materialized view has been aggregated and the queries primarily involve accessing entire data rows.
- Columnstore materialized views: For wide tables with a large volume of data and many columns, leveraging columnstore materialized views can facilitate more analysis and aggregation operations.
Use of primary key materialized views
Primary keys ensure the uniqueness of data in the materialized view and support more efficient data access and updates.
Create indexes on materialized views
Creating indexes on materialized views can significantly improve query performance. With indexes, the system can quickly locate the required data, reducing the need for full-table scans.
While materialized views are crucial for boosting query performance, they may still have limitations when dealing with massive amounts of data. To optimize the query performance of materialized views, you should focus on performance optimization to further speed up queries.
Create a materialized view
Notice
The following examples are run in MySQL mode.
Create the source table
First, we need to create a source table to store the original data. In this example, we create two tables named sales and items, which contain sales data and item information respectively.
Create a sales data table named
sales.CREATE TABLE sales ( order_id INT PRIMARY KEY, user_id INT, item_id INT, item_count INT, region VARCHAR(100) );Create the
itemstable.CREATE TABLE items ( order_id INT, product_id INT, quantity INT, price_per_item DECIMAL(10, 2) NOT NULL, pic_url VARCHAR(1000), PRIMARY KEY (order_id, product_id) );
Create a full refresh materialized view
We will create a materialized view based on the sales table. This materialized view will summarize the sales volume by product and region to accelerate queries.
Create a materialized view named mv_sales_summary for summarizing the sales volume by product and region.
CREATE MATERIALIZED VIEW mv_sales_summary(PRIMARY KEY(item_id))
REFRESH COMPLETE
START WITH sysdate()
NEXT sysdate() + interval 1 hour
AS SELECT item_id, region, SUM(item_count) AS total_count
FROM sales
GROUP BY item_id, region;
In the preceding example, we specified the following characteristics when we created the materialized view:
- PRIMARY KEY: A primary key is specified for the materialized view.
REFRESH COMPLETE: indicates a full refresh.START WITH sysdate() NEXT sysdate() + interval 1 hour: The task is scheduled to run at an interval of 1 hour.
Create an incremental refresh materialized view
For scenarios with frequent data changes, incremental refresh can be used to improve refresh efficiency. Before creating an incrementally refreshed materialized view, you must create a materialized view log (mlog) based on the base table of the materialized view, and then you can successfully create an incrementally refreshed materialized view.
For more information about materialized view logs, see Materialized view logs in MySQL mode and Materialized view logs in Oracle mode.
The following examples show three query scenarios that are supported for incremental refresh materialized views:
Single-table aggregation
Create an mlog on the
salestable.CREATE MATERIALIZED VIEW LOG ON sales WITH PRIMARY KEY (item_id, item_count, region) INCLUDING NEW VALUES;Create an incrementally refreshed materialized view named
mv_sales_summary_fastfor single-table aggregation.CREATE MATERIALIZED VIEW mv_sales_summary_fast REFRESH FAST START WITH sysdate() NEXT sysdate() + interval 1 hour AS SELECT item_id, region, SUM(item_count) AS total_count, count(*) as c, count(item_count) as count FROM sales GROUP BY item_id, region;
Multi-table join
Create an mlog on the
salestable.(Optional) Drop the mlog from the
salestable.If you have not created an mlog on the
salestable, skip this step.DROP MATERIALIZED VIEW LOG ON sales;Create an mlog on the
salestable.CREATE MATERIALIZED VIEW LOG ON sales WITH PRIMARY KEY (user_id, item_id, item_count, region) INCLUDING NEW VALUES;
Create an mlog on the
itemstable.CREATE MATERIALIZED VIEW LOG ON items WITH PRIMARY KEY (price_per_item, pic_url) INCLUDING NEW VALUES;Create a multi-table join incremental refresh materialized view named
mv_sales_items_join.CREATE MATERIALIZED VIEW mv_sales_items_join PARTITION BY HASH(order_id) PARTITIONS 10 REFRESH FAST START WITH sysdate() NEXT sysdate() + interval 1 hour AS SELECT s.order_id AS order_id, s.user_id AS customer_id, s.item_id AS item_id, s.item_count AS quantity, s.region AS region, i.order_id AS i_id, i.product_id AS i_item_id, i.price_per_item AS price_per_item, i.pic_url AS pic_url FROM sales s JOIN items i ON s.order_id = i.order_id;
Multi-table join aggregation
Create an mlog on the
salestable.(Optional) Drop the mlog from the
salestable.If you have not created an mlog on the
salestable, skip this step.DROP MATERIALIZED VIEW LOG ON sales;Create an mlog on the
salestable.CREATE MATERIALIZED VIEW LOG ON sales WITH PRIMARY KEY (item_id, item_count, region) INCLUDING NEW VALUES;
Create an mlog on the
itemstable.(Optional) Drop the mlog from the
itemstable.If you have not created an mlog on the
itemstable, skip this step.DROP MATERIALIZED VIEW LOG ON items;Create an mlog on the
itemstable.CREATE MATERIALIZED VIEW LOG ON items WITH PRIMARY KEY (price_per_item) INCLUDING NEW VALUES;
Create a multi-table join aggregation incremental refresh materialized view named
mv_sales_item_join_group.CREATE MATERIALIZED VIEW mv_sales_item_join_group REFRESH FAST START WITH sysdate() NEXT sysdate() + interval 1 hour AS SELECT s.item_id AS item_id, s.region AS region, SUM(s.item_count * i.price_per_item) AS sum_price, count(*) AS c, count(s.item_count * i.price_per_item) AS count FROM sales s JOIN items i ON s.order_id = i.order_id GROUP BY item_id, region;
Create a real-time materialized view
A real-time materialized view ensures that the query result of the materialized view is synchronized with the base table when the data changes. Since a real-time materialized view depends on a materialized view log, you must create a materialized view log before creating a real-time materialized view, just like an incrementally refreshed materialized view.
Create a materialized view log on the
salestable.(Optional) Drop the materialized view log on the
salestable.If no materialized view log has been created on the
salestable, skip this step.DROP MATERIALIZED VIEW LOG ON sales;Create a materialized view log on the
salestable.CREATE MATERIALIZED VIEW LOG ON sales WITH PRIMARY KEY (item_id, item_count, region) INCLUDING NEW VALUES;
Create a real-time materialized view named
mv_sales_summary_com.CREATE MATERIALIZED VIEW mv_sales_summary_com REFRESH FORCE START WITH sysdate() NEXT sysdate() + interval 1 hour ENABLE ON QUERY COMPUTATION AS SELECT item_id, region, SUM(item_count) AS total_count, count(*) as c, count(item_count) as count FROM sales GROUP BY item_id, region;
In this example, we enable ENABLE ON QUERY COMPUTATION to enable real-time updates of the materialized view during queries. This ensures that the materialized view provides the latest data. The query statement of the materialized view meets the requirements for creating an incrementally refreshed materialized view.
Create nested materialized views
During ETL, nested materialized views combine multiple materialized views into a more complex data processing workflow. Here, we will create two materialized views: one to associate sales information with item information, and another to perform further aggregation based on the first materialized view.
Create a materialized view log on the
salestable.(Optional) Drop the materialized view log on the
salestable.If no materialized view log has been created on the
salestable, skip this step.DROP MATERIALIZED VIEW LOG ON sales;Create a materialized view log on the
salestable.CREATE MATERIALIZED VIEW LOG ON sales WITH PRIMARY KEY (user_id, item_id, item_count, region) INCLUDING NEW VALUES;
Create a materialized view log on the
itemstable.(Optional) Drop the materialized view log on the
itemstable.If no materialized view log has been created on the
itemstable, skip this step.DROP MATERIALIZED VIEW LOG ON items;Create a materialized view log on the
itemstable.CREATE MATERIALIZED VIEW LOG ON items WITH PRIMARY KEY (price_per_item,pic_url) INCLUDING NEW VALUES;
Create a materialized view named
mv1_sales_items_jointo associate sales information with item information.CREATE MATERIALIZED VIEW mv1_sales_items_join REFRESH FAST START WITH sysdate() NEXT sysdate() + interval 1 hour AS SELECT s.order_id AS order_id, s.user_id AS customer_id, s.item_id AS item_id, s.item_count AS quantity, s.region AS region, i.order_id AS i_id, i.product_id AS i_item_id, i.price_per_item AS price_per_item, i.pic_url FROM sales s JOIN items i ON s.order_id = i.order_id;Create a materialized view log on the materialized view
mv1_sales_items_join.CREATE MATERIALIZED VIEW LOG ON mv1_sales_items_join WITH PRIMARY KEY (region,quantity,price_per_item) INCLUDING NEW VALUES;Create a materialized view named
mv2_join_sumbased on the materialized viewmv1_sales_items_join.CREATE MATERIALIZED VIEW mv2_join_sum REFRESH FAST START WITH sysdate() NEXT sysdate() + interval 1 hour AS SELECT region, sum(quantity * price_per_item) AS sum_price, count(*) as c, count(quantity * price_per_item) as count FROM mv1_sales_items_join GROUP BY region;
Create a columnar materialized view
Assume that you want to improve query efficiency in scenarios involving large volumes of data. In this case, you can create a columnar materialized view. By using columnar storage, the system reads only the required columns during queries, significantly reducing disk I/O.
Create a materialized view log on the
salestable.(Optional) Drop the materialized view log on the
salestable.If no materialized view log has been created on the
salestable, skip this step.DROP MATERIALIZED VIEW LOG ON sales;Create a materialized view log on the
salestable.CREATE MATERIALIZED VIEW LOG ON sales WITH PRIMARY KEY (user_id, item_id, item_count, region) INCLUDING NEW VALUES;
Create a materialized view log on the
itemstable.(Optional) Drop the materialized view log on the
itemstable.If no materialized view log has been created on the
itemstable, skip this step.DROP MATERIALIZED VIEW LOG ON items;Create a materialized view log on the
itemstable.CREATE MATERIALIZED VIEW LOG ON items WITH PRIMARY KEY (price_per_item,pic_url) INCLUDING NEW VALUES;
Create a columnar materialized view named
wide_sales_column.CREATE MATERIALIZED VIEW wide_sales_column WITH COLUMN GROUP(each column) REFRESH FAST START WITH sysdate() NEXT sysdate() + interval 1 hour AS SELECT s.order_id AS order_id, s.user_id AS customer_id, s.item_id AS item_id, s.item_count AS quantity, s.region AS region, i.order_id AS i_id, i.product_id AS i_item_id, i.price_per_item AS price_per_item, i.pic_url FROM sales s JOIN items i ON s.order_id = i.order_id;
In this example, we specified WITH COLUMN GROUP(each column), which indicates that the materialized view should use the columnar storage format. This is particularly useful in OLAP scenarios, especially for querying wide tables with large volumes of data.
Create a materialized view for query rewriting
In this example, we created a fully refreshed materialized view that supports query rewriting by specifying ENABLE QUERY REWRITE.
CREATE MATERIALIZED VIEW mv_sales_summary_select
REFRESH COMPLETE
START WITH sysdate()
NEXT sysdate() + interval 1 hour
ENABLE QUERY REWRITE
AS SELECT item_id, region, SUM(item_count) AS total_count
FROM sales
GROUP BY item_id, region;
In this example, we created a materialized view for query rewriting by specifying both ENABLE QUERY REWRITE and ENABLE ON QUERY COMPUTATION.
Create a materialized view log on the
salestable.(Optional) Drop the materialized view log on the
salestable.If no materialized view log has been created on the
salestable, skip this step.DROP MATERIALIZED VIEW LOG ON sales;Create a materialized view log on the
salestable.CREATE MATERIALIZED VIEW LOG ON sales WITH PRIMARY KEY (item_id, item_count, region) INCLUDING NEW VALUES;
Create a materialized view for query rewriting named
mv_sales_summary_com_select.CREATE MATERIALIZED VIEW mv_sales_summary_com_select REFRESH FAST START WITH sysdate() NEXT sysdate() + interval 1 hour ENABLE ON QUERY COMPUTATION ENABLE QUERY REWRITE AS SELECT item_id, region, SUM(item_count) AS total_sales, count(*) as c, count(item_count) as count FROM sales GROUP BY item_id, region;
Create an index on a materialized view
To further optimize query performance, you can create an index on a materialized view. An index on a materialized view helps accelerate queries on the materialized view, especially when large volumes of data are involved.
(Optional) Create a materialized view named
mv_sales_summarythat summarizes the sales volume by product and region.If you have already created the materialized view
mv_sales_summaryas shown in the Create a full-refresh materialized view section, skip this step.CREATE MATERIALIZED VIEW mv_sales_summary(PRIMARY KEY(item_id)) REFRESH COMPLETE START WITH sysdate() NEXT sysdate() + interval 1 hour AS SELECT item_id, region, SUM(item_count) AS total_count FROM sales GROUP BY item_id, region;Create an index named
idx_mv_sales_summaryon theregioncolumn of themv_sales_summarymaterialized view.CREATE INDEX idx_mv_sales_summary ON mv_sales_summary (region);This index can accelerate queries on the
mv_sales_summarymaterialized view, especially those involving theregioncolumn.
References
- For more information about materialized views, see Overview (MySQL mode) and Overview (Oracle mode).
- For more information about refreshing materialized views, see Refresh materialized views (MySQL mode) and Refresh materialized views (Oracle mode).
- For more information about deleting materialized view logs, see Materialized view logs (MySQL mode) and Materialized view logs (Oracle mode).
- For more information about deleting materialized views, see Delete materialized views (MySQL mode) and Delete materialized views (Oracle mode).
