Purpose
This function is the inverse of the continuous distribution model. It returns an interpolated value at the specified percentile for the given ordering specification. Null values are ignored in the calculation.
Note
- When used as an analytic function, the
OVERclause must be used to define the window for the calculation. It computes the value for a group of rows and returns multiple values. - When used as an aggregate function, the function computes the value for a group of rows and returns a single value. In this case, the
OVERclause is not required.
Syntax
PERCENTILE_CONT(percentile) WITHIN GROUP (ORDER BY expr [ DESC | ASC ])
[ OVER (query_partition_clause) ]
Parameters
| Parameter | Description |
|---|---|
| percentile | The percentile value, which is a constant of the numeric data type. The value range is [0,1]. Note The MEDIAN function is a special case of this function with percentile set to 0.5. |
| expr | The expression for the ordering specification. The data type is numeric or datetime. Note expr must be a single expression involving column references. Multiple expressions are not allowed. |
| DESC | ASC | The sorting order, which is optional.
|
| OVER | Use the OVER clause to define the window for the calculation. For more information, see Analytic Function Description. |
Note
After sorting the rows based on the ordering specification, the function calculates the row number using the specified percentile (P) and the number of non-null rows (N) with the formula RN = (1+ (P*(N-1)). The final result is calculated by linear interpolation between the values in the rows with row numbers CRN = CEILING(RN) and FRN = FLOOR(RN). The final result will be as follows:
If (CRN = FRN = RN), the result is (value of expression from row at RN); otherwise, the result is:
(CRN - RN) * (value of expression for row at FRN) + (RN - FRN) * (value of expression for row at CRN).
Return type
The data type of the return value is the same as that of the expr parameter.
Examples
Assume that the tbl1 table has been created.
obclient> SELECT * FROM tbl1;
+------+------+------+------+
| COL1 | COL2 | COL3 | COL4 |
+------+------+------+------+
| 1 | A1 | 8 | 12 |
| 1 | A2 | 10 | 15 |
| 1 | A3 | 11 | 16 |
| 2 | B1 | 9 | 14 |
| 2 | B2 | 10 | 15 |
| 2 | B3 | 8 | 13 |
| 2 | B4 | 11 | 16 |
| 3 | C1 | 8 | 18 |
| 3 | C2 | 9 | 16 |
| 3 | C3 | 10 | 15 |
| 3 | C4 | 11 | 12 |
| 3 | C5 | 12 | 10 |
+------+------+------+------+
12 rows in set
Example of aggregate function
Calculate the 10th percentile value of the col4 column.
obclient> SELECT PERCENTILE_CONT(0.1) WITHIN GROUP (ORDER BY col4) FROM tbl1;
+----------------------------------------------+
| PERCENTILE_CONT(0.1)WITHINGROUP(ORDERBYCOL4) |
+----------------------------------------------+
| 12 |
+----------------------------------------------+
1 row in set
Example of analytic function
Group by the col1 column and calculate the 50th percentile value of the col4 column (i.e., the median of the col4 column).
obclient> SELECT col1,col4,PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY col4) OVER (PARTITION BY col1)
FROM tbl1;
+------+------+-------------------------------------------------------------------+
| COL1 | COL4 | PERCENTILE_CONT(0.5)WITHINGROUP(ORDERBYCOL4)OVER(PARTITIONBYCOL1) |
+------+------+-------------------------------------------------------------------+
| 1 | 12 | 15 |
| 1 | 15 | 15 |
| 1 | 16 | 15 |
| 2 | 14 | 14.5 |
| 2 | 15 | 14.5 |
| 2 | 13 | 14.5 |
| 2 | 16 | 14.5 |
| 3 | 18 | 15 |
| 3 | 16 | 15 |
| 3 | 15 | 15 |
| 3 | 12 | 15 |
| 3 | 10 | 15 |
+------+------+-------------------------------------------------------------------+
12 rows in set