Purpose
These functions are linear regression functions. They fit an ordinary-least-squares regression line to a group of number pairs. You can use this function as an aggregate or analytic function.
Note
- When you use this function as an analytic function, use the
OVERclause to define a window over the data on which the function operates. The function operates on a group of rows to return a list of values. - When you use this function as an aggregate function, the function operates on a set of rows and returns a single value. You do not need to add the
OVERclause.
Syntax
{ REGR_SLOPE
| REGR_INTERCEPT
| REGR_COUNT
| REGR_R2
| REGR_AVGX
| REGR_AVGY
| REGR_SXX
| REGR_SYY
| REGR_SXY
}(expr1 , expr2)
[ OVER (analytic_clause) ]
Parameters
| Parameter | Description |
|---|---|
| REGR_SLOPE | Returns the slope of a line. The return value is a numeric data type and can be NULL. If (expr1, expr2) is a null pair, it is ignored, and the following calculation is performed: COVAR_POP(expr1, expr2) / VAR_POP(expr2) |
| REGR_INTERCEPT | Returns the intercept of the regression line y. The return value is a numeric data type and can be NULL. If (expr1, expr2) is a null pair, it is ignored, and the following calculation is performed: sql AVG(expr1) - REGR_SLOPE(expr1, expr2) * AVG(expr2) |
| REGR_COUNT | Returns the number of non-null number pairs that will be used to fill the regression line. The value is an integer. |
| REGR_R2 | Returns the deterministic coefficient of the regression, also known as the R-squared or goodness-of-fit value. The return value is a numeric data type and can be NULL. Evaluation is performed after nulls of VAR_POP(expr1) and VAR_POP(expr2) are ignored. The return value is: sql NULL if VAR_POP(expr2) = 0 1 if VAR_POP(expr1) = 0 and VAR_POP(expr2) != 0 POWER(CORR(expr1,expr),2) if VAR_POP(expr1) > 0 and VAR_POP(expr2 != 0 |
| REGR_AVGX | Returns the average of the independent variable (expr2) of the regression line. If (expr1, expr2) is a null pair, it is ignored, and the following calculation is performed: sql AVG(expr2) |
| REGR_AVGY | Returns the average of the dependent variable (expr1) of the regression line. If (expr1, expr2) is a null pair, it is ignored, and the following calculation is performed: sql AVG(expr1) |
| REGR_SXX | Returns the product of the number of value pairs, excluding the null pair (expr1, expr2), and the population variance of the independent variable. In other words, the return value is the result obtained by using the following formula: sql REGR_COUNT(expr1, expr2) * VAR_POP(expr2) |
| REGR_SYY | Returns the product of the number of value pairs, excluding the null pair (expr1, expr2), and the population variance of the dependent variable. In other words, the return value is the result obtained by using the following formula: sql REGR_COUNT(expr1, expr2) * VAR_POP(expr1) |
| REGR_SXY | Returns the product of the number of value pairs, excluding the null pair (expr1, expr2), and the population covariance. In other words, the return value is the result obtained by using the following formula: sql REGR_COUNT(expr1, expr2) * COVAR_POP(expr1, expr2) |
| expr1 | The value of the dependent variable (y). |
| expr2 | The value of the independent variable (x). |
| OVER | You can use the OVER clause to define a window over the data on which the function operates. For more information, see Analytic functions. |
Note
REGR_SXX, REGR_SYY, and REGR_SXY are auxiliary functions for computing various diagnostic statistics.
Return type
The return value is of a numeric data type and can be NULL.
Examples
The tbl1 table has been created.
obclient> SELECT * FROM tbl1;
+------+------+------+------+
| COL1 | COL2 | COL3 | COL4 |
+------+------+------+------+
| 1 | A1 | 8 | 12 |
| 1 | A2 | 10 | 15 |
| 1 | A3 | 11 | 16 |
| 2 | B1 | 9 | 14 |
| 2 | B2 | 10 | 15 |
| 2 | B3 | 8 | 13 |
| 2 | B4 | 11 | 16 |
| 3 | C1 | 8 | 18 |
| 3 | C2 | 9 | 16 |
| 3 | C3 | 10 | 15 |
| 3 | C4 | 11 | 12 |
| 3 | C5 | 12 | 10 |
| 4 | d1 | 8 | NULL |
| 4 | d1 | 9 | 10 |
| 4 | d1 | 10 | NULL |
+------+------+------+------+
15 rows in set
Example of an aggregate function
Compute the three types of diagnostic statistics based on the linear regression of the values on numeric pairs in the col3 and col4 columns.
obclient> SELECT col1,REGR_SXX(col3,col4) "REGR_SXX",
REGR_SYY(col3,col4) "REGR_SYY",
REGR_SXY(col3,col4) "REGR_SXY"
FROM tbl1
GROUP BY col1;
+------+----------------------------------------+------------------------------------------+----------------------------------------+
| COL1 | REGR_SXX | REGR_SYY | REGR_SXY |
+------+----------------------------------------+------------------------------------------+----------------------------------------+
| 1 | 8.666666666666666666666666666666666667 | 4.66666666666666666666666666666666666701 | 6.333333333333333333333333333333333333 |
| 2 | 5 | 5 | 5 |
| 3 | 40.8 | 10 | -20 |
| 4 | 0 | 0 | 0 |
+------+----------------------------------------+------------------------------------------+----------------------------------------+
4 rows in set
Example of an analytic function
Group data by the col1 column. Sort data by the col2 column in ascending order. Calculate the cumulative slope of the values for the col3 and col4 column.
obclient> SELECT col1,col2,col3,col4,
REGR_SLOPE(col3,col4) OVER(PARTITION BY col1 ORDER BY col2) "REGR_SLOPE"
FROM tbl1;
+------+------+------+------+--------------------------------------------+
| COL1 | COL2 | COL3 | COL4 | REGR_SLOPE |
+------+------+------+------+--------------------------------------------+
| 1 | A1 | 8 | 12 | NULL |
| 1 | A2 | 10 | 15 | .6666666666666666666666666666666666666667 |
| 1 | A3 | 11 | 16 | .7307692307692307692307692307692307691642 |
| 2 | B1 | 9 | 14 | NULL |
| 2 | B2 | 10 | 15 | 1 |
| 2 | B3 | 8 | 13 | 1 |
| 2 | B4 | 11 | 16 | 1 |
| 3 | C1 | 8 | 18 | NULL |
| 3 | C2 | 9 | 16 | -.5 |
| 3 | C3 | 10 | 15 | -.6428571428571428571428571428571428570956 |
| 3 | C4 | 11 | 12 | -.5066666666666666666666666666666666666667 |
| 3 | C5 | 12 | 10 | -.4901960784313725490196078431372549019608 |
| 4 | d1 | 8 | NULL | NULL |
| 4 | d1 | 9 | 10 | NULL |
| 4 | d1 | 10 | NULL | NULL |
+------+------+------+------+--------------------------------------------+
15 rows in set