Purpose
You can call this function to calculate the correlation coefficient of a set of numerical values. The correlation coefficient measures the strength of correlation between value pairs. The range is [-1,1]. The value 0 indicates the value pairs are unrelated. A value less than 0 indicates negative correlation. A value greater than 0 indicates positive correlation. You can use it as an aggregate or analytic function.
Note
- When you use it as an analytic function, use the
OVERclause to define a window over the data on which the function operates. The function operates on a group of rows to return a list of values.- When you use it as an aggregate function, the function operates on a set of rows and returns a single value. You do not need to add the
OVERclause.
Syntax
CORR(expr1, expr2) [ OVER (analytic_clause) ]
Parameters
| Parameter | Description |
|---|---|
| expr1 | Specifies the first parameter, which is of the numeric data type or can be implicitly converted to the numeric data type. |
| expr2 | Specifies the second parameter, which is of the numeric data type or can be implicitly converted to the numeric data type. |
| OVER | You can use the OVER clause to define a window over the data on which the function operates. For more information about analytic functions, see Analytic functions. |
Note
- The database determines the parameter with the highest numeric precedence, implicitly converts the remaining parameter values to this data type, and returns a result of this data type.
- The locations of the
expr1andexpr2parameters have no influence over the return result.- The return result is calculated by using the following formula:
COVAR_POP(expr1, expr2) / (STDDEV_POP(expr1) * STDDEV_POP(expr2)).
Return types
If any parameter is null or you have only one row of data, NULL is returned. Otherwise, a value of the NUMBER type is returned.
Examples
Execute the following statement to create table tbl1 and insert data into it:
obclient> CREATE TABLE tbl1(col1 INT,col2 varchar(10),col3 INT,col4 INT);
Query OK, 0 rows affected
obclient> INSERT INTO tbl1 VALUES(1,'A1',8,12),(1,'A2',10,15),(1,'A3',11,16),
(2,'B1',9,14),(2,'B2',10,15),(2,'B3',8,13),(2,'B4',11,16),(3,'C1',8,18),
(3,'C2',9,16),(3,'C3',10,15),(3,'C4',11,12),(3,'C5',12,10);
Query OK, 12 rows affected
Records: 12 Duplicates: 0 Warnings: 0
obclient> SELECT * FROM tbl1;
+------+------+------+------+
COL1 COL2 COL3 COL4
+------+------+------+------+
1 A1 8 12
1 A2 10 15
1 A3 11 16
2 B1 9 14
2 B2 10 15
2 B3 8 13
2 B4 11 16
3 C1 8 18
3 C2 9 16
3 C3 10 15
3 C4 11 12
3 C5 12 10
+------+------+------+------+
12 rows in set
Aggregate example
Calculate the correlation coefficient between the col3 column and the col4 column.
obclient> SELECT CORR(col3,col4) FROM tbl1;
+--------------------------------------------+
CORR(COL3,COL4)
+--------------------------------------------+
-.2705008904002296868793073195758520224002
+--------------------------------------------+
1 row in set
Analytic example
Group data by the col1 column. Calculate the correlation coefficient between the col3 column and the col4 column.
obclient> SELECT col1,col3,col4,CORR(col3,col4) OVER(PARTITION BY col1) "corr" FROM tbl1;
+------+------+------+--------------------------------------------+
COL1 COL3 COL4 corr
+------+------+------+--------------------------------------------+
1 8 12 .9958705948858223809835060513429288056548
1 10 15 .9958705948858223809835060513429288056548
1 11 16 .9958705948858223809835060513429288056548
2 9 14 1
2 10 15 1
2 8 13 1
2 11 16 1
3 8 18 -.9901475429766743091532731291244706579003
3 9 16 -.9901475429766743091532731291244706579003
3 10 15 -.9901475429766743091532731291244706579003
3 11 12 -.9901475429766743091532731291244706579003
3 12 10 -.9901475429766743091532731291244706579003
+------+------+------+--------------------------------------------+
12 rows in set