Purpose
You can call these functions to fit an ordinary-least-squares regression line to a group of number pairs. These functions are linear regression functions. You can use them as aggregate or analytic functions.
Note
- When you use an REGR_ (Linear Regression) function as an analytic function, use the
OVERclause to define a window over the data on which the function operates. The function operates on a group of rows to return a list of values.- When you use an REGR_ (Linear Regression) function as an aggregate function, the function operates on a set of rows and returns a single value. You do not need to add the
OVERclause.
Syntax
{ REGR_SLOPE
| REGR_INTERCEPT
| REGR_COUNT
| REGR_R2
| REGR_AVGX
| REGR_AVGY
| REGR_SXX
| REGR_SYY
| REGR_SXY
}(expr1 , expr2)
[ OVER (analytic_clause) ]
Parameters
| Parameter | Description |
|---|---|
| REGR_SLOPE | Returns the slope of a line. The return value is a numeric data type and can be NULL. After ignoring the null pairs (expr1, expr2), the function performs the following calculations: sql COVAR_POP(expr1, expr2) / VAR_POP(expr2) |
| REGR_INTERCEPT | Returns the intercept of the regression line y. The return value is a numeric data type and can be NULL. After ignoring the null pairs (expr1, expr2), the function performs the following calculations: sql AVG(expr1) - REGR_SLOPE(expr1, expr2) * AVG(expr2) |
| REGR_COUNT | Returns the number of non-null number pairs that will be used to fill the regression line. The value is an integer. |
| REGR_R2 | Returns the deterministic coefficient of the regression, also known as the R-squared or goodness-of-fit value. The return value is a numeric data type and can be NULL. The function calculates the number after ignoring nulls of VAR_POP(expr1) and VAR_POP(expr2). The return value is: sql NULL if VAR_POP(expr2) = 0 1 if VAR_POP(expr1) = 0 and VAR_POP(expr2) != 0 POWER(CORR(expr1,expr),2) if VAR_POP(expr1) > 0 and VAR_POP(expr2 != 0 |
| REGR_AVGX | Returns the average of the independent variable (expr2) of the regression line. After ignoring the null pairs (expr1, expr2), the function performs the following calculations: sql AVG(expr2) |
| REGR_AVGY | Returns the average of the dependent variable (expr1) of the regression line. After ignoring the null pairs (expr1, expr2), the function performs the following calculations: sql AVG(expr1) |
| REGR_SXX | The return value is the product of the number of value pairs (with null pairs of expr1 and expr2 excluded) and the population variance of the independent variable, that is, the result of the following formula: sql REGR_COUNT(expr1, expr2) * VAR_POP(expr2) |
| REGR_SYY | The return value is the product of the number of value pairs (with null pairs of expr1 and expr2 excluded) and the population variance of the dependent variable, that is, the result of the following formula: sql REGR_COUNT(expr1, expr2) * VAR_POP(expr1) |
| REGR_SXY | The return value is the product of the number of value pairs (with null pairs of expr1 and expr2 excluded) and the population covariance, that is, the result of the following formula: sql REGR_COUNT(expr1, expr2) * COVAR_POP(expr1, expr2) |
| expr1 | Specifies the value of the dependent variable (y). |
| expr2 | Specifies the value of the independent variable (x). |
| OVER | You can use the OVER clause to define a window over the data on which the function operates. For more information about analytic functions, see Analytic functions. |
Note
REGR_SXX,REGR_SYY, andREGR_SXYare auxiliary functions for computing various diagnostic statistics.
Return type
The return value is of a numeric data type and can be NULL.
Examples
The tbl1 table has been created.
obclient> SELECT * FROM tbl1;
+------+------+------+------+
| COL1 | COL2 | COL3 | COL4 |
+------+------+------+------+
| 1 | A1 | 8 | 12 |
| 1 | A2 | 10 | 15 |
| 1 | A3 | 11 | 16 |
| 2 | B1 | 9 | 14 |
| 2 | B2 | 10 | 15 |
| 2 | B3 | 8 | 13 |
| 2 | B4 | 11 | 16 |
| 3 | C1 | 8 | 18 |
| 3 | C2 | 9 | 16 |
| 3 | C3 | 10 | 15 |
| 3 | C4 | 11 | 12 |
| 3 | C5 | 12 | 10 |
| 4 | d1 | 8 | NULL |
| 4 | d1 | 9 | 10 |
| 4 | d1 | 10 | NULL |
+------+------+------+------+
15 rows in set
Example of an aggregate function
Compute the three types of diagnostic statistics based on the linear regression of the values on numeric pairs in the col3 and col4 columns.
obclient> SELECT col1,REGR_SXX(col3,col4) "REGR_SXX",
REGR_SYY(col3,col4) "REGR_SYY",
REGR_SXY(col3,col4) "REGR_SXY"
FROM tbl1
GROUP BY col1;
+------+----------------------------------------+------------------------------------------+----------------------------------------+
| COL1 | REGR_SXX | REGR_SYY | REGR_SXY |
+------+----------------------------------------+------------------------------------------+----------------------------------------+
| 1 | 8.666666666666666666666666666666666667 | 4.66666666666666666666666666666666666701 | 6.333333333333333333333333333333333333 |
| 2 | 5 | 5 | 5 |
| 3 | 40.8 | 10 | -20 |
| 4 | 0 | 0 | 0 |
+------+----------------------------------------+------------------------------------------+----------------------------------------+
4 rows in set
Example of an analytic function
Group data by the col1 column. Sort data by the col2 column in ascending order. Calculate the cumulative slope of the values for the col3 and col4 column.
obclient> SELECT col1,col2,col3,col4,
REGR_SLOPE(col3,col4) OVER(PARTITION BY col1 ORDER BY col2) "REGR_SLOPE"
FROM tbl1;
+------+------+------+------+--------------------------------------------+
| COL1 | COL2 | COL3 | COL4 | REGR_SLOPE |
+------+------+------+------+--------------------------------------------+
| 1 | A1 | 8 | 12 | NULL |
| 1 | A2 | 10 | 15 | .6666666666666666666666666666666666666667 |
| 1 | A3 | 11 | 16 | .7307692307692307692307692307692307691642 |
| 2 | B1 | 9 | 14 | NULL |
| 2 | B2 | 10 | 15 | 1 |
| 2 | B3 | 8 | 13 | 1 |
| 2 | B4 | 11 | 16 | 1 |
| 3 | C1 | 8 | 18 | NULL |
| 3 | C2 | 9 | 16 | -.5 |
| 3 | C3 | 10 | 15 | -.6428571428571428571428571428571428570956 |
| 3 | C4 | 11 | 12 | -.5066666666666666666666666666666666666667 |
| 3 | C5 | 12 | 10 | -.4901960784313725490196078431372549019608 |
| 4 | d1 | 8 | NULL | NULL |
| 4 | d1 | 9 | 10 | NULL |
| 4 | d1 | 10 | NULL | NULL |
+------+------+------+------+--------------------------------------------+
15 rows in set