Analytic functions and aggregate functions both perform aggregation calculations on a set of rows (a group of rows). The difference is that an aggregate function returns only one value (one row) per group, while an analytic function can return multiple values (multiple rows) per group. This group of rows is also known as a window, defined by the analytic_clause. The size of the window depends on the actual number of rows or logical intervals (such as time). Each row within the group is based on the logical calculation results of the window.
To trigger an analytic function, you need to use the special keyword OVER to specify the window. A window consists of the following three components:
Partitioning specification: This is used to split the input rows into different partitions. This process is similar to the splitting process of the
GROUP BYclause.Ordering specification: This determines the order in which the input data rows are processed by the analytic function.
Window boundaries: These specify the boundaries of the window for data calculation. The default value is
RANGE UNBOUNDED PRECEDING. This boundary includes all data from the beginning of the current partition up to the current row.
Analytic functions are the last set of operations executed in a query, except for the final ORDER BY clause. Before processing an analytic function, all JOIN operations and WHERE, GROUP BY, and HAVING clauses must be completed. Therefore, analytic functions can only appear in the select list or in the ORDER BY clause.
Syntax
analytic_function
analytic_function specifies the name of the analytic function. The syntax is as follows:
analytic_function([ arguments ]) OVER (analytic_clause)
Analytic functions can take 0 to 3 arguments. The arguments can be any numeric data type or any non-numeric data type that can be implicitly converted to a numeric data type. OceanBase Database determines the argument with the highest numeric precedence based on data type precedence and implicitly converts the other arguments to the data type of this argument. The return type is the data type of the argument with the highest numeric precedence, unless otherwise specified for the function.
analytic_clause
The analytic_clause clause, specified as OVER analytic_clause, indicates that the function operates on the query result set. This clause is evaluated after the FROM, WHERE, GROUP BY, and HAVING clauses. You can use this clause in the select list or in the ORDER BY clause to specify an analytic function. If you want to filter the results of a query that uses an analytic function, you can nest the function in a parent query and then filter the results of the nested subquery. The syntax is as follows:
[ query_partition_clause ] [ order_by_clause [ windowing_clause ] ]
Notice
- You cannot nest analytic functions by specifying them in the analytic_clause clause. However, you can specify an analytic function in a subquery and then compute another analytic function on it.
- You can use user-defined analytic functions and built-in analytic functions to specify
analytic_clause.
query_partition_clause
The query_partition_clause clause, specified as PARTITION BY, divides the query result set into groups based on one or more value_expr expressions. If you omit this clause, the function treats all rows in the query result set as a single group.
You can specify multiple analytic functions in the same query, each with its own key or a different key. If you specify an analytic function using query_partition_clause and the queried object has parallel attributes, the function computation is also parallelized.
The valid values of value_expr are constants, columns, non-analytic functions, function expressions, or expressions involving any of these.
The syntax is as follows:
PARTITION BY { expr[, expr ]... | ( expr[, expr ]... ) }
order_by_clause
The order_by_clause clause specifies how to sort data within each partition. For all analytic functions, you can sort values on multiple keys, each defined by a value_expr expression, and limited by the sort sequence.
In each function, you can specify multiple sort expressions. This is particularly useful when using functions that sort values.
When order_by_clause produces the same value for multiple rows, the function behaves as follows:
CUME_DIST,DENSE_RANK,NTILE,PERCENT_RANK, andRANKreturn the same result for each row.ROW_NUMBERassigns a different value to each row, even if one value is based onorder_by_clause. The value is based on the order in which rows are processed, which may be indeterminate ifORDER BYcannot achieve a total sort.For other analytic functions, the result depends on the window rules. If you specify a logical window with the
RANGEkeyword, the function returns the same result for each row. If you specify a physical window with theROWSkeyword, the result is indeterminate.
The syntax is as follows:
ORDER [ SIBLINGS ] BY{ expr | position | c_alias } [ ASC | DESC ] [ NULLS FIRST | NULLS LAST ] [, { expr | position | c_alias } [ ASC | DESC ][ NULLS FIRST | NULLS LAST ]]...
ASC or DESC keyword
Specify the sort sequence. ASC indicates ascending order, and DESC indicates descending order. The default is ascending (ASC).
NULLS FIRST or NULLS LAST keyword
In the order_by_clause, nulls first and nulls last specify how to handle NULL values during sorting. nulls first treats NULL values as the smallest values during sorting, while nulls last treats NULL values as the largest values during sorting.
Limitations on the ORDER BY clause
The following limitations apply when using the ORDER BY clause:
In analytic functions,
order_by_clausemust use an expression (expr). TheSIBLINGSkeyword is invalid (only applicable in hierarchical queries). Position (position) and column alias (c_alias) are also invalid. Otherwise, theorder_by_clauseis the same as the sorting command for the entire query or subquery.Analytic functions using the
RANGEkeyword can specify multiple sort keys in theirORDER BYclause. You need to specify the following windows:RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW(also known asRANGE UNBOUNDED PRECEDING)RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWINGRANGE BETWEEN CURRENT ROW AND CURRENT ROWRANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
Except for the above four windows, window boundaries can only have one sort key in the ORDER BY clause of analytic functions. This restriction does not apply to window boundaries specified with the ROW keyword.
windowing_clause
Some analytic functions can use windowing_clause. The syntax is as follows:
{ ROWS | RANGE } { BETWEEN { UNBOUNDED PRECEDING | CURRENT ROW | value_expr { PRECEDING | FOLLOWING } } AND{ UNBOUNDED FOLLOWING | CURRENT ROW | value_expr { PRECEDING | FOLLOWING } } | { UNBOUNDED PRECEDING | CURRENT ROW| value_expr PRECEDING}}
ROWS or RANGE keyword
These keywords define a window for each row to calculate the function result, and then apply the function to all rows in the window. The window moves down through the query result set or partition. The window is also called FRAME, and OceanBase Database supports the following window statements:
ROWS: specifies the window in terms of physical units (rows).RANGE: specifies the window in terms of logical offsets. The default isRANGE UNBOUNDED PRECEDING.
If you use windowing_clause in an analytic function, you must add order_by_clause. If windowing_clause is defined by the RANGE clause, you can specify only one expression in order_by_clause. For more information, see Limitations of ORDER BY clause.
Analytic functions with logical offsets always return deterministic values. However, analytic functions with physical offsets may return nondeterministic values. To make the analytic functions with physical offsets return deterministic values, the sorting expressions must return unique values. Therefore, you must specify multiple columns in order_by_clause to achieve unique sorting.
BETWEEN ... AND keyword
Use the BETWEEN ... AND clause to specify the start and end of the window. The first expression (before AND) defines the start, and the second expression (after AND) defines the end. If you omit BETWEEN and specify only one end, OceanBase Database treats it as the start, and the end defaults to the current row.
UNBOUNDED PRECEDING keyword
UNBOUNDED PRECEDING indicates that the window starts from the first row of the partition. This is used for the start, not the end.
UNBOUNDED FOLLOWING keyword
UNBOUNDED FOLLOWING indicates that the window ends at the last row of the partition. This is used for the end, not the start.
CURRENT ROW keyword
As the start, CURRENT ROW specifies that the window starts from the current row or value (depending on whether ROW or RANGE is specified). In this case, the end cannot be value_expr PRECEDING. As the end, CURRENT ROW specifies that the window ends at the current row or value (depending on whether ROW or RANGE is specified). In this case, the start cannot be value_expr FOLLOWING.
value_expr PRECEDING or value_expr FOLLOWING keyword
If
value_expr FOLLOWINGis the start, the end must also bevalue_expr FOLLOWING.If
value_expr PRECEDINGis the end, the start must also bevalue_expr PRECEDING.
If you want to define a logical window based on a time interval, you may need to use conversion functions.
If you specify ROWS:
value_expris a physical offset. It must be a constant or expression that evaluates to a positive number.If
value_expris part of the start, it must be calculated as a single row before the end.
If you specify RANGE:
value_expris a logical offset. It must be a constant or expression that evaluates to a positive number or interval literal.You can specify only one expression in
order_by_clause.If
value_expris a number,ORDER BY exprmust be of theNUMBERorDATEdata type.If
value_expris an interval value,ORDER BY exprmust be of theDATEdata type. If you omitwindowing_clause, the default isRANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.
