Purpose
This function calculates the correlation coefficient for a set of numerical pairs. The correlation coefficient represents the strength of the association between numerical pairs, ranging from [-1,1]. A value of 0 indicates no correlation, a value less than 0 indicates a negative correlation, and a value greater than 0 indicates a positive correlation. This function can be used as an aggregate or analytic function.
Note
- When used as an analytic function, the
OVERclause is required to define the window for calculation. It computes the correlation coefficient for a set of rows and returns multiple values. - When used as an aggregate function, the function aggregates the correlation coefficient for a set of rows and returns a single value. In this case, the
OVERclause is not required.
Syntax
CORR(expr1, expr2) [ OVER (analytic_clause) ]
Parameters
| Parameter | Description |
|---|---|
| expr1 | Specifies the first parameter, which is a numeric data type or can be implicitly converted to a numeric data type. |
| expr2 | Specifies the second parameter, which is a numeric data type or can be implicitly converted to a numeric data type. |
| OVER | Specifies the OVER clause to define the window for calculation. For more information, see Analytic Function Description. |
Note
- The database determines the parameter with the highest numeric precedence, implicitly converts the other parameters to this data type, and returns the result in this data type.
- The positions of parameters
expr1andexpr2do not affect the result. - The calculation formula for the result is:
COVAR_POP(expr1, expr2) / (STDDEV_POP(expr1) * STDDEV_POP(expr2)).
Return type
If any parameter is NULL or only one row of data exists, NULL is returned. Otherwise, a NUMBER value is returned.
Examples
Create a table named tbl1 and insert data.
obclient> CREATE TABLE tbl1(col1 INT,col2 varchar(10),col3 INT,col4 INT);
Query OK, 0 rows affected
obclient> INSERT INTO tbl1 VALUES(1,'A1',8,12),(1,'A2',10,15),(1,'A3',11,16),
(2,'B1',9,14),(2,'B2',10,15),(2,'B3',8,13),(2,'B4',11,16),(3,'C1',8,18),
(3,'C2',9,16),(3,'C3',10,15),(3,'C4',11,12),(3,'C5',12,10);
Query OK, 12 rows affected
Records: 12 Duplicates: 0 Warnings: 0
obclient> SELECT * FROM tbl1;
+------+------+------+------+
| COL1 | COL2 | COL3 | COL4 |
+------+------+------+------+
| 1 | A1 | 8 | 12 |
| 1 | A2 | 10 | 15 |
| 1 | A3 | 11 | 16 |
| 2 | B1 | 9 | 14 |
| 2 | B2 | 10 | 15 |
| 2 | B3 | 8 | 13 |
| 2 | B4 | 11 | 16 |
| 3 | C1 | 8 | 18 |
| 3 | C2 | 9 | 16 |
| 3 | C3 | 10 | 15 |
| 3 | C4 | 11 | 12 |
| 3 | C5 | 12 | 10 |
+------+------+------+------+
12 rows in set
Aggregate function example
Calculate the correlation coefficient between the data in column col3 and the data in column col4.
obclient> SELECT CORR(col3,col4) FROM tbl1;
+--------------------------------------------+
| CORR(COL3,COL4) |
+--------------------------------------------+
| -.2705008904002296868793073195758520224002 |
+--------------------------------------------+
1 row in set
Analytic function example
Group by column col1 and calculate the correlation coefficient between the data in column col3 and the data in column col4.
obclient> SELECT col1,col3,col4,CORR(col3,col4) OVER(PARTITION BY col1) "corr" FROM tbl1;
+------+------+------+--------------------------------------------+
| COL1 | COL3 | COL4 | corr |
+------+------+------+--------------------------------------------+
| 1 | 8 | 12 | .9958705948858223809835060513429288056548 |
| 1 | 10 | 15 | .9958705948858223809835060513429288056548 |
| 1 | 11 | 16 | .9958705948858223809835060513429288056548 |
| 2 | 9 | 14 | 1 |
| 2 | 10 | 15 | 1 |
| 2 | 8 | 13 | 1 |
| 2 | 11 | 16 | 1 |
| 3 | 8 | 18 | -.9901475429766743091532731291244706579003 |
| 3 | 9 | 16 | -.9901475429766743091532731291244706579003 |
| 3 | 10 | 15 | -.9901475429766743091532731291244706579003 |
| 3 | 11 | 12 | -.9901475429766743091532731291244706579003 |
| 3 | 12 | 10 | -.9901475429766743091532731291244706579003 |
+------+------+------+--------------------------------------------+
12 rows in set
