Analytic functions

2024-03-05 01:54:27  Updated

Analytic functions and aggregate functions both calculate an aggregate value based on a group of rows. The difference is that aggregate functions return only one value (one row) for each group, whereas analytic functions return multiple rows for each group. This group of rows is also called a window, and is defined in analytic_clause. The window size is subject to the actual number of rows or logical interval such as the time. Each row in the group is the result of a window-based logical calculation.

To trigger an analytic function, you need to use the OVER keyword to specify the window. A window consists of three components:

  • The partition specification for splitting the input rows into different partitions. This process is similar to the splitting process of the GROUP BY clause.

  • The ordering specification that determines the order in which the analytic function processes the input rows.

  • The window boundary for data computation. The default value is RANGE UNBOUNDED PRECEDING. This boundary contains all data from the start to the current row in the current partition.

The analytic function is the last set of operations to be executed in a query, except for the ORDER BY clause in the end. Before processing an analytic function, the database must complete all JOIN operations and WHERE, GROUP BY and HAVING clauses. Therefore, an analytic function can only appear in a select list or ORDER BY clause.

Syntax

analytic_function

analytic_function specifies the name of an analytic function. The syntax is as follows:

analytic_function([ arguments ]) OVER (analytic_clause)

An analytic function uses 0 to 3 arguments. The arguments can be of any numeric data type or any non-numeric data type that can be implicitly converted into a numeric data type. OceanBase Database determines the argument with the highest numeric precedence based on the data type priorities, and then implicitly converts arguments of other data types into the data type with the highest numeric precedence. Unless otherwise specified, the return type is also the data type with the highest numeric precedence.

analytic_clause

An analytic clause (analytic_clause) uses OVER analytic_clause to instruct the function to operate on the query result set. This clause is performed after execution of the FROM, WHERE, GROUP BY and HAVING clauses. You can use this clause to specify the analytic function in the select list or ORDER BY clause. If you want to filter the query results by using an analytic function, nest the function in the parent query and then filter the results of the nested subquery. The syntax is as follows:

[ query_partition_clause ] [ order_by_clause [ windowing_clause ] ]

Notice

  • You cannot nest an analytic function by specifying the analytic function in an analytic clause. However, you can specify an analytic function in a subquery and execute another analytic function on it.
  • You can specify analytic_clause with user-defined parsing functions and built-in parsing functions.

query_partition_clause

The partitioning clause (query_partition_clause) uses the PARTITION BY clause to divide the query result set into one or more value_expr groups. If this clause is omitted, the function treats all rows of the query result set as a single group.

You can specify multiple analytic functions in one query, with each function having the same or different partitions based on the keys. If you have specified an analytic function in query_partition_clause and the queried objects have parallel attributes, the function executes computation in parallel.

Valid values of value_expr include constants, columns, non-analytic functions, function expressions, or expressions involving any of them.

The syntax is as follows:

PARTITION BY { expr[, expr ]... | ( expr[, expr ]... ) }

order_by_clause

Use the sorting clause order_by_clause to specify how data is sorted within the partition. For all analytic functions, you can sort values in partitions on multiple keys. Each key is defined in value_expr, restricted by the sorting sequence.

In each function, you can specify multiple sorting expressions. This is useful when you use functions to sort values.

When order_by_clause generates the same value for multiple rows, the function complies with the following rules:

  • CUME_DIST, DENSE_RANK, NTILE, PERCENT_RANK, and RANK return the same result for each row.

  • Even given a value based on order_by_clause, ROW_NUMBER assigns a different value to each row. The value is based on the row processing order. If ORDER BY fails to implement sorting for all rows, this order may be uncertain.

  • For other analytic functions, its results depend on the window rules. If you specify a logical window with a RANGE keyword, the function returns the same result for each row. If you specify a physical window with a ROWS keyword, the result is uncertain.

The syntax is as follows:

ORDER [ SIBLINGS ] BY{ expr | position | c_alias } [ ASC | DESC ] [ NULLS FIRST | NULLS LAST ] [, { expr | position | c_alias } [ ASC | DESC ][ NULLS FIRST | NULLS LAST ]]...

ASC or DESC keyword

These keywords specify the sorting order, where ASC specifies to sort in ascending order and DESC specifies to sort in descending order. By default, ASC is used.

NULLS FIRST or NULLS LAST keyword

In order_by_clause, NULLS FIRST and NULLS LAST are keywords that specify how NULLs are treated. NULLS FIRST indicates that NULLs are treated as minimum values during sorting, whereas NULLS LAST indicates that NULLs are treated as maximum values during sorting.

Limitations on the ORDER BY clause

An ORDER BY clause is subject to the following limitations:

  • In an analytic function, order_by_clause must be an expression (expr). The SIBLINGS keyword is invalid (only relevant in hierarchical queries). The location keyword (position) and column alias (c_alias) are also invalid. Otherwise, order_by_clause is the same as the sorting command for the entire query or subquery.

  • An analytic function using the RANGE keyword may use multiple collation keys in the ORDER BY clause of the function. You need to specify the following windows:

    • RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, or RANGE UNBOUNDED PRECEDING for short
    • RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
    • RANGE BETWEEN CURRENT ROW AND CURRENT ROW
    • RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

In addition to the preceding four windows, the window boundary can only have one collation key in the ORDER BY clause of the analytic function. This limitation does not apply to window boundaries specified by the ROW keyword.

windowing_clause

Some analytic functions can use windowing_clause. The syntax is as follows:

{ ROWS | RANGE } { BETWEEN { UNBOUNDED PRECEDING | CURRENT ROW | value_expr { PRECEDING | FOLLOWING } } AND{ UNBOUNDED FOLLOWING | CURRENT ROW | value_expr { PRECEDING | FOLLOWING } } | { UNBOUNDED PRECEDING | CURRENT ROW| value_expr PRECEDING}}

ROWS or RANGE keyword

These keywords define a window for each row to calculate the result of the function and then apply the function to all rows in the window. The window moves from top to bottom through the query result set or partition. This window is also called FRAME. OceanBase Database supports the following window statements:

  • ROWS: specifies the window in physical units (rows).

  • RANGE: specifies the window as a logical offset. Default value: RANGE UNBOUNDED PRECEDING.

You can use windowing_clause in analytic functions. To use this clause, you must add order_by_clause. In windowing_clause, if the window boundary is specified by the RANGE clause, you can specify only one expression in order_by_clause. For more information, see Limitations on the ORDER BY clause.

The value returned by an analytic function with a logical offset is always certain. However, the value returned by an analytical function with a physical offset may produce uncertain results. A sorting expression must return a unique order so that the analytical function with a physical offset returns a unique value. Therefore, you must specify a unique order by specifying multiple columns in order_by_clause.

BETWEEN ... AND keyword

Use the BETWEEN ... AND clause to specify the starting and end points of the window. The first expression (before AND) defines the start point, and the second expression (after AND after) defines the end point. If BETWEEN is omitted and only an end point is specified, OceanBase Database regards it as the start point, and takes the current row as the end point by default.

UNBOUNDED PRECEDING keyword

UNBOUNDED PRECEDING instructs the window to start from the first row of the partition. This keyword is used for the start point and cannot be used for the end point.

UNBOUNDED FOLLOWING keyword

UNBOUNDED FOLLOWING specifies that the window ends in the last row of the partition. This keyword is used for the end point and cannot be used for the start point.

CURRENT ROW keyword

As a start point, CURRENT ROW specifies that the window starts from the current row or current value, depending on whether ROW or RANGE is specified. In this case, the start point cannot be value_expr PRECEDING. As an end point, CURRENT ROW specifies that the window ends at the current row or current value, depending on whether ROW or RANGE is specified. In this case, the start point cannot be value_expr FOLLOWING.

value_expr PRECEDING or value_expr FOLLOWING keyword

  • If value_expr FOLLOWING is the start point, the end point must be value_expr FOLLOWING.

  • If value_expr PRECEDING is the end point, the start point must be value_expr PRECEDING.

If you want to define a logical window with a time interval of a numeric format, you may need to use a conversion function.

If you specify ROWS:

  • value_expr is the physical offset. It must be a constant or expression and must be evaluated as a positive number.

  • If value_expr is a part of the start point, it takes the part between the start point and the end point as a row.

If you specify RANGE:

  • value_expr is the logical offset. It must be a constant or expression evaluated as a positive numeric value or an interval literal.

  • You can specify only on expression in order_by_clause.

  • If value_expr is a numeric value, ORDER BY expr must be of the numeric or DATE data type.

  • If value_expr is an interval value, ORDER BY expr must be of the DATE data type. If you omit windowing_clause, the default value is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.

Contact Us