REGR_ (Linear Regression) Functions

2023-10-27 09:57:43  Updated

Purpose

You can call these functions to fit an ordinary-least-squares regression line to a group of number pairs. These functions are linear regression functions. You can use them as aggregate or analytic functions.

Note

  • When you use an REGR_ (Linear Regression) function as an analytic function, use the OVER clause to define a window over the data on which the function operates. The function operates on a group of rows to return a list of values.
  • When you use an REGR_ (Linear Regression) function as an aggregate function, the function operates on a set of rows and returns a single value. You do not need to add the OVER clause.

Syntax

{ REGR_SLOPE 
 | REGR_INTERCEPT 
 | REGR_COUNT 
 | REGR_R2 
 | REGR_AVGX
 | REGR_AVGY 
 | REGR_SXX 
 | REGR_SYY 
 | REGR_SXY
}(expr1 , expr2)
  [ OVER (analytic_clause) ]

Parameters

Parameter Description
REGR_SLOPE Returns the slope of a line. The return value is a numeric data type and can be NULL. After ignoring the null pairs (expr1, expr2), the function performs the following calculations: sql COVAR_POP(expr1, expr2) / VAR_POP(expr2)
REGR_INTERCEPT Returns the intercept of the regression line y. The return value is a numeric data type and can be NULL. After ignoring the null pairs (expr1, expr2), the function performs the following calculations: sql AVG(expr1) - REGR_SLOPE(expr1, expr2) * AVG(expr2)
REGR_COUNT Returns the number of non-null number pairs that will be used to fill the regression line. The value is an integer.
REGR_R2 Returns the deterministic coefficient of the regression, also known as the R-squared or goodness-of-fit value. The return value is a numeric data type and can be NULL. The function calculates the number after ignoring nulls of VAR_POP(expr1) and VAR_POP(expr2). The return value is: sql NULL if VAR_POP(expr2) = 0 1 if VAR_POP(expr1) = 0 and VAR_POP(expr2) != 0 POWER(CORR(expr1,expr),2) if VAR_POP(expr1) > 0 and VAR_POP(expr2 != 0
REGR_AVGX Returns the average of the independent variable (expr2) of the regression line. After ignoring the null pairs (expr1, expr2), the function performs the following calculations: sql AVG(expr2)
REGR_AVGY Returns the average of the dependent variable (expr1) of the regression line. After ignoring the null pairs (expr1, expr2), the function performs the following calculations: sql AVG(expr1)
REGR_SXX The return value is the product of the number of value pairs (with null pairs of expr1 and expr2 excluded) and the population variance of the independent variable, that is, the result of the following formula: sql REGR_COUNT(expr1, expr2) * VAR_POP(expr2)
REGR_SYY The return value is the product of the number of value pairs (with null pairs of expr1 and expr2 excluded) and the population variance of the dependent variable, that is, the result of the following formula: sql REGR_COUNT(expr1, expr2) * VAR_POP(expr1)
REGR_SXY The return value is the product of the number of value pairs (with null pairs of expr1 and expr2 excluded) and the population covariance, that is, the result of the following formula: sql REGR_COUNT(expr1, expr2) * COVAR_POP(expr1, expr2)
expr1 Specifies the value of the dependent variable (y).
expr2 Specifies the value of the independent variable (x).
OVER You can use the OVER clause to define a window over the data on which the function operates. For more information about analytic functions, see Analytic functions.

Note

REGR_SXX, REGR_SYY, and REGR_SXY are auxiliary functions for computing various diagnostic statistics.

Return type

The return value is of a numeric data type and can be NULL.

Examples

The tbl1 table has been created.

obclient> SELECT * FROM tbl1;
+------+------+------+------+
| COL1 | COL2 | COL3 | COL4 |
+------+------+------+------+
|    1 | A1   |    8 |   12 |
|    1 | A2   |   10 |   15 |
|    1 | A3   |   11 |   16 |
|    2 | B1   |    9 |   14 |
|    2 | B2   |   10 |   15 |
|    2 | B3   |    8 |   13 |
|    2 | B4   |   11 |   16 |
|    3 | C1   |    8 |   18 |
|    3 | C2   |    9 |   16 |
|    3 | C3   |   10 |   15 |
|    3 | C4   |   11 |   12 |
|    3 | C5   |   12 |   10 |
|    4 | d1   |    8 | NULL |
|    4 | d1   |    9 |   10 |
|    4 | d1   |   10 | NULL |
+------+------+------+------+
15 rows in set

Example of an aggregate function

Compute the three types of diagnostic statistics based on the linear regression of the values on numeric pairs in the col3 and col4 columns.

obclient> SELECT col1,REGR_SXX(col3,col4) "REGR_SXX",
            REGR_SYY(col3,col4) "REGR_SYY",
            REGR_SXY(col3,col4) "REGR_SXY"
          FROM tbl1
          GROUP BY col1;
+------+----------------------------------------+------------------------------------------+----------------------------------------+
| COL1 | REGR_SXX                               | REGR_SYY                                 | REGR_SXY                               |
+------+----------------------------------------+------------------------------------------+----------------------------------------+
|    1 | 8.666666666666666666666666666666666667 | 4.66666666666666666666666666666666666701 | 6.333333333333333333333333333333333333 |
|    2 |                                      5 |                                        5 |                                      5 |
|    3 |                                   40.8 |                                       10 |                                    -20 |
|    4 |                                      0 |                                        0 |                                      0 |
+------+----------------------------------------+------------------------------------------+----------------------------------------+
4 rows in set

Example of an analytic function

Group data by the col1 column. Sort data by the col2 column in ascending order. Calculate the cumulative slope of the values for the col3 and col4 column.

obclient> SELECT col1,col2,col3,col4,
            REGR_SLOPE(col3,col4) OVER(PARTITION BY col1 ORDER BY col2) "REGR_SLOPE" 
          FROM tbl1;
+------+------+------+------+--------------------------------------------+
| COL1 | COL2 | COL3 | COL4 | REGR_SLOPE                                 |
+------+------+------+------+--------------------------------------------+
|    1 | A1   |    8 |   12 |                                       NULL |
|    1 | A2   |   10 |   15 |  .6666666666666666666666666666666666666667 |
|    1 | A3   |   11 |   16 |  .7307692307692307692307692307692307691642 |
|    2 | B1   |    9 |   14 |                                       NULL |
|    2 | B2   |   10 |   15 |                                          1 |
|    2 | B3   |    8 |   13 |                                          1 |
|    2 | B4   |   11 |   16 |                                          1 |
|    3 | C1   |    8 |   18 |                                       NULL |
|    3 | C2   |    9 |   16 |                                        -.5 |
|    3 | C3   |   10 |   15 | -.6428571428571428571428571428571428570956 |
|    3 | C4   |   11 |   12 | -.5066666666666666666666666666666666666667 |
|    3 | C5   |   12 |   10 | -.4901960784313725490196078431372549019608 |
|    4 | d1   |    8 | NULL |                                       NULL |
|    4 | d1   |    9 |   10 |                                       NULL |
|    4 | d1   |   10 | NULL |                                       NULL |
+------+------+------+------+--------------------------------------------+
15 rows in set

Contact Us