Purpose
This function is the inverse of the cumulative distribution function for a continuous distribution model. It returns an interpolated value at a specified percentile for a given sort order. Null values are ignored in the calculation.
Note
- When used as an analytic function, you must use the
OVERclause to define the window for calculation. It calculates over a set of rows and returns multiple values. - When used as an aggregate function, it aggregates over a set of rows and returns a single value. In this case, the
OVERclause is not required.
Syntax
PERCENTILE_CONT(percentile) WITHIN GROUP (ORDER BY expr [ DESC | ASC ])
[ OVER (query_partition_clause) ]
Parameters
| Parameter | Description |
|---|---|
| percentile | The percentile value, a constant of numeric data type, in the range [0,1]. Note The MEDIAN function is a special case where percentile is 0.5. |
| expr | The expression for the sort order, of numeric or datetime data type. Note expr must be a single expression involving column references. Multiple expressions are not allowed. |
| DESC | ASC | The sort order, optional.
|
| OVER | Use the OVER clause to define the window for calculation. For more information, see Analytic Function Description. |
Note
After sorting the rows based on the sort order, the function calculates the row number RN = (1+ (P*(N-1)) using the specified percentile (P) and the number of non-null rows (N). The final result is calculated by linear interpolation between the values in the rows at CRN = CEILING(RN) and FRN = FLOOR(RN). The final result will be as follows:
If (CRN = FRN = RN), the result is (value of expression from row at RN); otherwise, the result is:
(CRN - RN) * (value of expression for row at FRN) + (RN - FRN) * (value of expression for row at CRN).
Return type
The same data type as the expr parameter.
Examples
Assume that the tbl1 table has been created.
obclient> SELECT * FROM tbl1;
+------+------+------+------+
| COL1 | COL2 | COL3 | COL4 |
+------+------+------+------+
| 1 | A1 | 8 | 12 |
| 1 | A2 | 10 | 15 |
| 1 | A3 | 11 | 16 |
| 2 | B1 | 9 | 14 |
| 2 | B2 | 10 | 15 |
| 2 | B3 | 8 | 13 |
| 2 | B4 | 11 | 16 |
| 3 | C1 | 8 | 18 |
| 3 | C2 | 9 | 16 |
| 3 | C3 | 10 | 15 |
| 3 | C4 | 11 | 12 |
| 3 | C5 | 12 | 10 |
+------+------+------+------+
12 rows in set
Aggregate function example
Calculate the 10th percentile value of column col4.
obclient> SELECT PERCENTILE_CONT(0.1) WITHIN GROUP (ORDER BY col4) FROM tbl1;
+----------------------------------------------+
| PERCENTILE_CONT(0.1)WITHINGROUP(ORDERBYCOL4) |
+----------------------------------------------+
| 12 |
+----------------------------------------------+
1 row in set
Analytic function example
Group by column col1 and calculate the 50th percentile value of column col4 (i.e., the median of column col4).
obclient> SELECT col1,col4,PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY col4) OVER (PARTITION BY col1)
FROM tbl1;
+------+------+-------------------------------------------------------------------+
| COL1 | COL4 | PERCENTILE_CONT(0.5)WITHINGROUP(ORDERBYCOL4)OVER(PARTITIONBYCOL1) |
+------+------+-------------------------------------------------------------------+
| 1 | 12 | 15 |
| 1 | 15 | 15 |
| 1 | 16 | 15 |
| 2 | 14 | 14.5 |
| 2 | 15 | 14.5 |
| 2 | 13 | 14.5 |
| 2 | 16 | 14.5 |
| 3 | 18 | 15 |
| 3 | 16 | 15 |
| 3 | 15 | 15 |
| 3 | 12 | 15 |
| 3 | 10 | 15 |
+------+------+-------------------------------------------------------------------+
12 rows in set