Purpose
PERCENTILE_CONT() is an inverse distribution function that assumes a continuous distribution model. It returns an interpolated value that falls into the specified percentile value with respect to the sort specification. Nulls are ignored in the calculation.
Note
- When you use this function as an analytic function, use the
OVERclause to define a window over the data on which the function operates. The function operates on a group of rows to return a list of values. - When you use this function as an aggregate function, the function operates on a set of rows and returns a single value. You do not need to add the
OVERclause.
Syntax
PERCENTILE_CONT(percentile) WITHIN GROUP (ORDER BY expr [ DESC | ASC ])
[ OVER (query_partition_clause) ]
Parameters
| Parameter | Description |
|---|---|
| percentile | The percentile value. It is a constant of the numeric data type in the range of [0, 1]. Note The MEDIAN function is a special case where percentile equals 0.5. |
| expr | The sorting specification. The data type is numeric or datetime. Notice expr must be a single expression involving column references. Multiple expressions are not allowed. |
| DESC | ASC | Optional. The sorting method of the list.
|
| OVER | You can use the OVER clause to define a window over the data on which the function operates. For more information, see Analytic functions. |
Note
This function sorts the data according to the sorting specification, and then calculates the row number based on the given percentile (P) and non-empty rows (N). The formula for calculating the row number is RN = (1+ (P*(N-1)). The final result of this function is calculated upon linear interpolation on row values between the row number CRN = CEILING(RN) and the row number FRN = FLOOR(RN). The final result is determined based on the following rules:
If (CRN = FRN = RN), the result is (value of expression from row at RN). Otherwise, the result is:
(CRN - RN) * (value of expression for row at FRN) + (RN - FRN) * (value of expression for row at CRN)
Return type
The return type is the same as the data type of the expr parameter.
Examples
Assume that you have created the table tbl1.
obclient> SELECT * FROM tbl1;
+------+------+------+------+
| COL1 | COL2 | COL3 | COL4 |
+------+------+------+------+
| 1 | A1 | 8 | 12 |
| 1 | A2 | 10 | 15 |
| 1 | A3 | 11 | 16 |
| 2 | B1 | 9 | 14 |
| 2 | B2 | 10 | 15 |
| 2 | B3 | 8 | 13 |
| 2 | B4 | 11 | 16 |
| 3 | C1 | 8 | 18 |
| 3 | C2 | 9 | 16 |
| 3 | C3 | 10 | 15 |
| 3 | C4 | 11 | 12 |
| 3 | C5 | 12 | 10 |
+------+------+------+------+
12 rows in set
Example of an aggregate function
Calculate the value at the 10th percentile in the col4 column.
obclient> SELECT PERCENTILE_CONT(0.1) WITHIN GROUP (ORDER BY col4) FROM tbl1;
+----------------------------------------------+
| PERCENTILE_CONT(0.1)WITHINGROUP(ORDERBYCOL4) |
+----------------------------------------------+
| 12 |
+----------------------------------------------+
1 row in set
Example of an analytic function
Group data by the col1 column. Calculate the value at the 50th percentile in the col4 column, namely the median of the col4 column.
obclient> SELECT col1,col4,PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY col4) OVER (PARTITION BY col1)
FROM tbl1;
+------+------+-------------------------------------------------------------------+
| COL1 | COL4 | PERCENTILE_CONT(0.5)WITHINGROUP(ORDERBYCOL4)OVER(PARTITIONBYCOL1) |
+------+------+-------------------------------------------------------------------+
| 1 | 12 | 15 |
| 1 | 15 | 15 |
| 1 | 16 | 15 |
| 2 | 14 | 14.5 |
| 2 | 15 | 14.5 |
| 2 | 13 | 14.5 |
| 2 | 16 | 14.5 |
| 3 | 18 | 15 |
| 3 | 16 | 15 |
| 3 | 15 | 15 |
| 3 | 12 | 15 |
| 3 | 10 | 15 |
+------+------+-------------------------------------------------------------------+
12 rows in set