Purpose
This function is the inverse of the cumulative distribution function for a continuous distribution model. It returns an interpolated value at the specified percentile for the given ordering. Null values are ignored in the computation.
Note
- When used as an analytic function, you must use the
OVERclause to define the window for computation. It computes over a set of rows and returns multiple values. - When used as an aggregate function, it computes over a set of rows and returns a single value. In this case, the
OVERclause is not required.
Syntax
PERCENTILE_CONT(percentile) WITHIN GROUP (ORDER BY expr [ DESC | ASC ])
[ OVER (query_partition_clause) ]
Parameters
| Parameter | Description |
|---|---|
| percentile | A constant of numeric data type specifying the percentile value. The value must be in the range [0,1]. Note The MEDIAN function is equivalent to setting percentile to 0.5. |
| expr | An expression specifying the ordering criteria. The data type must be numeric or datetime. Note expr must be a single expression involving column references. Multiple expressions are not allowed. |
| DESC | ASC | Optional. Specifies the sort order.
|
| OVER | Use the OVER clause to define the window for computation. For more information, see Analytic Function Description. |
Note
After sorting the rows based on the ordering criteria, the function calculates the row number using the specified percentile (P) and the count of non-null rows (N) with the formula RN = (1+ (P*(N-1)). The final result is computed by linear interpolation between the values at row numbers CRN = CEILING(RN) and FRN = FLOOR(RN). The final result will be as follows:
If (CRN = FRN = RN), the result is (value of expression from row at RN); otherwise, the result is:
(CRN - RN) * (value of expression for row at FRN) + (RN - FRN) * (value of expression for row at CRN).
Return type
The same data type as the expr parameter.
Examples
Assume that the tbl1 table has been created.
obclient> SELECT * FROM tbl1;
+------+------+------+------+
| COL1 | COL2 | COL3 | COL4 |
+------+------+------+------+
| 1 | A1 | 8 | 12 |
| 1 | A2 | 10 | 15 |
| 1 | A3 | 11 | 16 |
| 2 | B1 | 9 | 14 |
| 2 | B2 | 10 | 15 |
| 2 | B3 | 8 | 13 |
| 2 | B4 | 11 | 16 |
| 3 | C1 | 8 | 18 |
| 3 | C2 | 9 | 16 |
| 3 | C3 | 10 | 15 |
| 3 | C4 | 11 | 12 |
| 3 | C5 | 12 | 10 |
+------+------+------+------+
12 rows in set
Aggregate function example
Calculate the 10th percentile value of the col4 column.
obclient> SELECT PERCENTILE_CONT(0.1) WITHIN GROUP (ORDER BY col4) FROM tbl1;
+----------------------------------------------+
| PERCENTILE_CONT(0.1)WITHINGROUP(ORDERBYCOL4) |
+----------------------------------------------+
| 12 |
+----------------------------------------------+
1 row in set
Analytic function example
Group by the col1 column and calculate the 50th percentile value of the col4 column (which is the median of the col4 column).
obclient> SELECT col1,col4,PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY col4) OVER (PARTITION BY col1)
FROM tbl1;
+------+------+-------------------------------------------------------------------+
| COL1 | COL4 | PERCENTILE_CONT(0.5)WITHINGROUP(ORDERBYCOL4)OVER(PARTITIONBYCOL1) |
+------+------+-------------------------------------------------------------------+
| 1 | 12 | 15 |
| 1 | 15 | 15 |
| 1 | 16 | 15 |
| 2 | 14 | 14.5 |
| 2 | 15 | 14.5 |
| 2 | 13 | 14.5 |
| 2 | 16 | 14.5 |
| 3 | 18 | 15 |
| 3 | 16 | 15 |
| 3 | 15 | 15 |
| 3 | 12 | 15 |
| 3 | 10 | 15 |
+------+------+-------------------------------------------------------------------+
12 rows in set