Analytic functions and aggregate functions both calculate an aggregate value based on a group of rows. The difference is that aggregate functions return only one value (one row) for each group, whereas analytic functions return multiple rows for each group. This group of rows is also called a window, and is defined in analytic_clause. The window size is subject to the actual number of rows or logical interval such as the time. Each row in the group is the result of a window-based logical calculation.
To trigger an analytic function, you need to use the OVER keyword to specify the window. A window consists of three components:
The partition specification for splitting the input rows into different partitions. This process is similar to the splitting process of the
GROUP BYclause.The ordering specification that determines the order in which the analytic function processes the input rows.
The window boundary for data computation. Default value:
RANGE UNBOUNDED PRECEDING. This boundary contains all data from the start to the current row in the current partition.
The analytic function is the last set of operations to be executed in a query, except for the ORDER clause in the end. Before processing an analytic function, the database must complete all JOIN operations and WHERE, GROUP BY and HAVING clauses. Therefore, an analytic function can only appear in a select list or ORDER BY clause.
Syntax
analytic_function
analytic_function specifies the name of an analytic function. Syntax:
analytic_function([ arguments ]) OVER (analytic_clause)
An analytic function uses 0 to 3 arguments. The arguments can be of any numeric data type or any non-numeric data type that can be implicitly converted into a numeric data type. OceanBase Database determines the argument with the highest numeric precedence based on the data type priorities, and then implicitly converts arguments of other data types into the data type with the highest numeric precedence. Unless otherwise specified, the return type is also the data type with the highest numeric precedence.
analytic_clause
An analytic clause (analytic_clause) uses OVER analytic_clause to instruct the function to operate on the query result set. This clause is performed after execution of the FROM, WHERE, GROUP BY and HAVING clauses. You can use this clause to specify the analytic function in the select list or ORDER BY clause. If you want to filter the query results by using an analytic function, nest the function in the parent query and then filter the results of the nested subquery. Syntax:
[ query_partition_clause ] [ order_by_clause [ windowing_clause ] ]
Notice
- You cannot nest an analytic function by specifying the analytic function in an analytic clause. However, you can specify an analytic function in a subquery and execute another analytic function on it.
- You can specify
analytic_clausewith user-defined parsing functions and built-in parsing functions.
query_partition_clause
The partitioning clause (query_partition_clause) uses the PARTITION BY clause to divide the query result set into one or more value_expr groups. If this clause is omitted, the function treats all rows of the query result set as a single group.
You can specify multiple analytic functions in one query, with each function having the same or different partitions based on the keys. If you have specified an analytic function in query_partition_clause and the queried objects have parallel attributes, the function executes computation in parallel.
Valid values of value_expr include constants, columns, non-analytic functions, function expressions, or expressions involving any of them.
Syntax:
PARTITION BY { expr[, expr ]... | ( expr[, expr ]... ) }
order_by_clause
Use the sorting clause order_by_clause to specify how data is sorted within the partition. For all analytic functions, you can sort values in partitions on multiple keys. Each key is defined in value_expr, restricted by the sorting sequence.
In each function, you can specify multiple sorting expressions. This is useful when you use functions to sort values.
When order_by_clause generates the same value for multiple rows, the function complies with the following rules:
CUME_DIST,DENSE_RANK,NTILE,PERCENT_RANK, andRANKreturn the same result for each row.Even given a value based on
order_by_clause,ROW_NUMBERassigns a different value to each row. The value is based on the row processing order. IfORDER BYfails to implement sorting for all rows, this order may be uncertain.For other analytic functions, its results depend on the window rules. If you specify a logical window with a
RANGEkeyword, the function returns the same result for each row. If you specify a physical window with aROWSkeyword, the result is uncertain.
Syntax:
ORDER [ SIBLINGS ] BY{ expr | position | c_alias } [ ASC | DESC ] [ NULLS FIRST | NULLS LAST ] [, { expr | position | c_alias } [ ASC | DESC ][ NULLS FIRST | NULLS LAST ]]...
ASC or DESC keyword
These keywords specify the sorting order, where ASC specifies to sort in ascending order and DESC specifies to sort in descending order. By default, ASC is used.
NULLS FIRST or NULLS LAST keyword
In order_by_clause, nulls first and nulls last are keywords that specify how a null is treated. nulls first indicates that nulls are treated as minimum values during sorting, whereas nulls last indicates that nulls are treated as maximum values during sorting.
Limitations on the ORDER BY clause
An ORDER BY clause is subject to the following limits:
In an analytic function,
order_by_clausemust be an expression (expr). TheSIBLINGSkeyword is invalid (only relevant in hierarchical queries). The location keyword (position) and column alias (c_alias) are also invalid. Otherwise,order_by_clauseis the same as the sorting command for the entire query or subquery.An analytic function using the
RANGEkeyword may use multiple collation keys in theORDERclause of the function. You need to specify the following windows:RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, orRANGE UNBOUNDED PRECEDINGfor shortRANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWINGRANGE BETWEEN CURRENT ROW AND CURRENT ROWRANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
In addition to the preceding four windows, the window boundary can only have one collation key in the ORDER clause of the analytic function. This limit does not apply to window boundaries specified by the ROW keyword.
windowing_clause
Some analytic functions can use windowing_clause. Syntax:
{ ROWS | RANGE } { BETWEEN { UNBOUNDED PRECEDING | CURRENT ROW | value_expr { PRECEDING | FOLLOWING } } AND{ UNBOUNDED FOLLOWING | CURRENT ROW | value_expr { PRECEDING | FOLLOWING } } | { UNBOUNDED PRECEDING | CURRENT ROW| value_expr PRECEDING}}
ROWS or RANGE keyword
These keywords define a window for each row to calculate the result of the function and then apply the function to all rows in the window. The window moves from top to bottom through the query result set or partition. This window is also called FRAME. OceanBase Database supports the following window statements:
ROWS: specifies the window in physical units (rows).RANGE: specifies the window as a logical offset. Default value:RANGE UNBOUNDED PRECEDING.
You can use windowing_clause in analytic functions. To use this clause, you must add order_by_clause. In windowing_clause, if the window boundary is specified by the RANGE clause, you can specify only one expression in order_by_clause. For more information, see Limits on the ORDER BY clause.
The value returned by an analytic function with a logical offset is always certain. However, the value returned by an analytical function with a physical offset may produce uncertain results. A sorting expression must return a unique order so that the analytical function with a physical offset returns a unique value. Therefore, you must specify a unique order by specifying multiple columns in order_by_clause.
BETWEEN ... AND keyword
Use the BETWEEN ... AND clause to specify the starting and end points of the window. The first expression (before AND) defines the start point, and the second expression (after AND after) defines the end point. If BETWEEN is omitted and only an end point is specified, OceanBase Database regards it as the start point, and takes the current row as the end point by default.
UNBOUNDED PRECEDING keyword
UNBOUNDED PRECEDING instructs the window to start from the first row of the partition. This keyword is used for the start point and cannot be used for the end point.
UNBOUNDED FOLLOWING keyword
UNBOUNDED FOLLOWING specifies that the window ends in the last row of the partition. This keyword is used for the end point and cannot be used for the start point.
CURRENT ROW keyword
As a start point, CURRENT ROW specifies that the window starts from the current row or current value, depending on whether ROW or RANGE is specified. In this case, the start point cannot be value_expr PRECEDING. As an end point, CURRENT ROW specifies that the window ends at the current row or current value, depending on whether ROW or RANGE is specified. In this case, the start point cannot be value_expr FOLLOWING.
value_expr PRECEDING or value_expr FOLLOWING keyword
If
value_expr FOLLOWINGis the start point, the end point must bevalue_expr FOLLOWING.If
value_expr PRECEDINGis the end point, the start point must bevalue_expr PRECEDING.
If you want to define a logical window with a time interval of a numeric format, you may need to use a conversion function.
If you specify ROWS,
value_expris the physical offset. It must be a constant or expression and must be evaluated as a positive number.If
value_expris a part of the start point, it takes the part between the start point and the end point as a row.
If you specify RANGE,
value_expris the logical offset. It must be a constant or expression evaluated as a positive numeric value or an interval literal.You can specify only on expression in
order_by_clause.If
value_expris a numeric value,ORDER BY exprmust be of the numeric orDATEdata type.If
value_expris an interval value,ORDER BY exprmust be of theDATEdata type. If you omitwindowing_clause, the default value isRANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.