Purpose
This function is the inverse of the cumulative distribution function for a continuous distribution model. It returns an interpolated value at the specified percentile for the given ordering specification. Null values are ignored in the calculation.
Note
- When used as an analytic function, the
OVERclause is required to define the window for computation. It computes the result for a set of rows and returns multiple values. - When used as an aggregate function, the function computes the result for a set of rows and returns a single value. In this case, the
OVERclause is not required.
Syntax
PERCENTILE_CONT(percentile) WITHIN GROUP (ORDER BY expr [ DESC | ASC ])
[ OVER (query_partition_clause) ]
Parameters
| Parameter | Description |
|---|---|
| percentile | A constant of numeric data type specifying the percentile value. The value ranges from [0,1]. Note The MEDIAN function is equivalent to setting percentile to 0.5. |
| expr | An expression specifying the ordering specification. The data type is numeric or datetime. Note expr must be a single expression involving column references. Multiple expressions are not allowed. |
| DESC | ASC | Specifies the sorting order. This is an optional parameter.
|
| OVER | Defines the window for computation using the OVER clause. For more information, see Analytic Function Description. |
Note
After sorting the rows based on the ordering specification, the function calculates the row number RN = (1+ (P*(N-1)) using the specified percentile value (P) and the number of non-null rows (N). The final result is computed through linear interpolation between the values in the rows with row numbers CRN = CEILING(RN) and FRN = FLOOR(RN). The final result will be as follows:
If (CRN = FRN = RN), the result is (value of expression from row at RN). Otherwise, the result is:
(CRN - RN) * (value of expression for row at FRN) + (RN - FRN) * (value of expression for row at CRN).
Return type
The same data type as the expr parameter.
Examples
Assume that the tbl1 table has been created.
obclient> SELECT * FROM tbl1;
+------+------+------+------+
| COL1 | COL2 | COL3 | COL4 |
+------+------+------+------+
| 1 | A1 | 8 | 12 |
| 1 | A2 | 10 | 15 |
| 1 | A3 | 11 | 16 |
| 2 | B1 | 9 | 14 |
| 2 | B2 | 10 | 15 |
| 2 | B3 | 8 | 13 |
| 2 | B4 | 11 | 16 |
| 3 | C1 | 8 | 18 |
| 3 | C2 | 9 | 16 |
| 3 | C3 | 10 | 15 |
| 3 | C4 | 11 | 12 |
| 3 | C5 | 12 | 10 |
+------+------+------+------+
12 rows in set
Aggregate function example
Calculate the 10th percentile value of the col4 column.
obclient> SELECT PERCENTILE_CONT(0.1) WITHIN GROUP (ORDER BY col4) FROM tbl1;
+----------------------------------------------+
| PERCENTILE_CONT(0.1)WITHINGROUP(ORDERBYCOL4) |
+----------------------------------------------+
| 12 |
+----------------------------------------------+
1 row in set
Analytic function example
Group by the col1 column and calculate the 50th percentile value of the col4 column (i.e., the median of the col4 column).
obclient> SELECT col1,col4,PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY col4) OVER (PARTITION BY col1)
FROM tbl1;
+------+------+-------------------------------------------------------------------+
| COL1 | COL4 | PERCENTILE_CONT(0.5)WITHINGROUP(ORDERBYCOL4)OVER(PARTITIONBYCOL1) |
+------+------+-------------------------------------------------------------------+
| 1 | 12 | 15 |
| 1 | 15 | 15 |
| 1 | 16 | 15 |
| 2 | 14 | 14.5 |
| 2 | 15 | 14.5 |
| 2 | 13 | 14.5 |
| 2 | 16 | 14.5 |
| 3 | 18 | 15 |
| 3 | 16 | 15 |
| 3 | 15 | 15 |
| 3 | 12 | 15 |
| 3 | 10 | 15 |
+------+------+-------------------------------------------------------------------+
12 rows in set