Java UDF
A Java UDF is a user-defined function implemented in the Java language. By supporting Java UDFs, you can integrate a large number of Java ecosystem products with OceanBase Database, thereby improving the efficiency of UDF development.
Create a Java UDF
You can compile and package the following code as my_add.jar, and then create a Java UDF in an OBServer node. Here is an example:
package org.example;
public class MyAdd {
public Integer evaluate(Integer a, Integer b) {
if (a == null || b == null) {
return null;
}
return a + b;
}
}
Create a Java UDF
To meet the flexibility of Java UDFs, you can create a Java UDF in two ways: by using a URL or an external resource.
Create a Java UDF by using a URL
Execute the following SQL statement to create a Java UDF named my_add.
CREATE FUNCTION my_add(x int, y int)
RETURNS int
PROPERTIES (
symbol = 'org.example.MyAdd',
type = 'odpsjar',
file = '<URL to Jar>'
);
where:
symbol: specifies the entry class name.type: specifies the type of the external UDF. Valid values:odpsjar,UDAFJar,UDTFJar, andPython.file: specifies the URL of the JAR package.
After the Java UDF is created, you can use it like a PL/SQL UDF. Here is an example:
obclient> CREATE FUNCTION my_add(x int, y int)
-> RETURNS int
-> PROPERTIES (
-> symbol = 'org.example.MyAdd',
-> type = 'odpsjar',
-> file = 'http://******/my_add.jar'
-> );
Query OK, 0 rows affected (0.113 sec)
obclient> SELECT my_add(1, 2);
+--------------+
| my_add(1, 2) |
+--------------+
| 3 |
+--------------+
1 row in set (0.021 sec)
Create a Java UDF by using an external resource
OBServer manages the dependency resources of external stored procedures, such as the JAR packages of Java UDFs, by using external resources. You must upload the JAR package to OBServer by using the DBMS_JAVA.LOADJAVA system package function, and then reference the external resource corresponding to the JAR package in the Java UDF.
The upload of external resources and the creation of UDFs do not have a strict order. As long as the resources are uploaded before the UDF is called, the UDF can be called. In addition, one external resource can be used by multiple UDFs.
The procedure is as follows:
Upload the JAR package
Call the
DBMS_JAVA.LOADJAVAfunction to upload the JAR package as an external resource to OBServer.String url = "<URL to Jar>"; InputStream is = new URL(url).openStream(); // conn is a connection to OceanBase PreparedStatement ps = conn.prepareStatement("call dbms_java.loadjava(? ,? ,? )"); ps.setString(1, "my_add_jar"); ps.setBlob(2, is); ps.setString(3, "my add UDF jar"); ps.execute();Create a Java UDF
Execute the following SQL statement to create a Java UDF named
my_add.CREATE FUNCTION my_add(x int, y int) RETURNS int PROPERTIES ( symbol = 'org.example.MyAdd', type = 'odpsjar', file = 'my_add_jar' );where:
symbol: specifies the entry class name.type: specifies the type of the external UDF. Valid values:odpsjar,UDAFJar,UDTFJar, andPython.file: specifies the name of the external resource corresponding to the JAR package.
After the Java UDF is created, you can use it like a PL/SQL UDF. Here is an example:
obclient> CREATE FUNCTION my_add(x int, y int)
-> RETURNS int
-> PROPERTIES (
-> symbol = 'org.example.MyAdd',
-> type = 'odpsjar',
-> file = 'my_add_jar'
-> );
Query OK, 0 rows affected (0.118 sec)
obclient> SELECT my_add(1, 2);
+--------------+
| my_add(1, 2) |
+--------------+
| 3 |
+--------------+
1 row in set (0.005 sec)
Java UDTF
A Java UDTF is a user-defined table function implemented in the Java language. The biggest difference between a UDTF and a general UDF is that a general UDF returns a scalar value, that is, one row of data, whereas a UDTF can return multiple rows of data.
Create a Java UDTF
To meet the flexibility of Java UDTFs, you can create a Java UDTF in two ways: by using a URL or an external resource.
Create a Java UDTF by using a URL
You can compile and package the following code as my_split.jar, and then create a Java UDTF in an OBServer node. Here is an example:
package my.test;
public class MySplit {
public String[] process(String str) {
if (str == null) return null;
return str.split(" ");
}
}
Create a Java UDTF
CREATE FUNCTION my_split(x longtext)
RETURNS varchar(1024)
PROPERTIES (
symbol = 'my.test.MySplit',
type = 'UDTFJar',
file = 'http://******/my_split.jar'
);
When you create an external function, you can specify type as UDTFJar to create a Java UDTF. After the Java UDTF is created, you can use it with the table function clause in an SQL statement. You can use it in a scalar query or join it with other tables by using the join clause.
Call the Java UDTF
Here is an example:
Create a table named
t1and insert some data into the table.obclient> CREATE TABLE t1(a int, b decimal(10, 2), c1 varchar(32));obclient> INSERT INTO t1 VALUES (1, 2.1, 'hello oceanbase'), (2, 2.2, 'hello UDTF') ;Execute the following command to call the Java UDTF.
obclient> SELECT t1.a,t1.b, COLUMN_VALUE FROM t1, table(my_split(t1.c1)); +------+------+--------------+ | a | b | COLUMN_VALUE | +------+------+--------------+ | 1 | 2.10 | hello | | 1 | 2.10 | oceanbase | | 2 | 2.20 | hello | | 2 | 2.20 | UDTF | +------+------+--------------+
Create a Java UDTF by using an external resource
OBServer manages the dependency resources of external stored procedures, such as the JAR packages of Java UDTFs, by using external resources. You must upload the JAR package to OBServer by using the DBMS_JAVA.LOADJAVA system package function, and then reference the external resource corresponding to the JAR package in the Java UDTF.
The procedure is as follows:
Upload the JAR package
Call the
DBMS_JAVA.LOADJAVAfunction to upload the JAR package as an external resource to OBServer.Create a Java UDTF
Execute the following SQL statement to create a Java UDTF named
my_split.CREATE FUNCTION my_split(x int, y int) RETURNS int PROPERTIES ( symbol = 'my.test.MySplit', type = 'UDTFJar', file = 'my_split_jar' );where:
symbol: specifies the entry class name.type: specifies the type of the external UDF. Valid values:odpsjar,UDAFJar,UDTFJar, andPython.file: specifies the name of the external resource corresponding to the JAR package.
After the Java UDTF is created, you can use it like a PL/SQL UDTF. Here is an example:
obclient> SELECT t1.a,t1.b, COLUMN_VALUE FROM t1, table(my_split(t1.c1));
+------+------+--------------+
| a | b | COLUMN_VALUE |
+------+------+--------------+
| 1 | 2.10 | hello |
| 1 | 2.10 | oceanbase |
| 2 | 2.20 | hello |
| 2 | 2.20 | UDTF |
+------+------+--------------+
Java UDAF
A Java user-defined aggregate function (UDAFA) is an aggregate function that you implement by using Java. User-defined aggregate functions (UDAFs) can aggregate data from a column or multiple columns in several rows by using a user-defined logic into a scalar value. Typically, UDAFs are used with the GROUP BY statement. For each GROUP, a UDAF returns an aggregated value.
Create a Java UDAF
To accommodate the flexibility required for Java UDAFs, we provide two ways to create them: one using URLs and another using external resources.
Create a Java UDAF by using a URL
Compile and package the following Java code into my_avg.jar, and then you can create a Java UDAF in the OBServer. Here is an example:
package my.test;
public class MyAvg {
private double sum = 0;
private double count = 0;
public void iterate(Double x) {
sum += x;
count += 1;
}
public void merge(MyAvg other) {
sum += other.sum;
count += other.count;
}
public Double terminate() {
return sum / count;
}
}
Create a Java UDAF
CREATE FUNCTION my_avg(x double)
RETURNS double
PROPERTIES (
symbol = 'my.test.MyAvg',
type = 'UDAFJar',
file = 'http://******/my_avg.jar'
);
Call the Java UDAF function
The following example shows how to call the procedure:
Create the
ttable and insert some data into it.obclient> CREATE TABLE t(a int, b int);obclient> INSERT INTO t VALUES(1, 10),(2, 100),(1, 20),(2, 200),(3, 0);Call the Java UDAF by running the following command:
obclient> SELECT my_avg(b) FROM t GROUP BY a; +--------------+ |my_avg(b) | +--------------+ |15.0 | +--------------+ |150.0 | +--------------+ |0.0 | +--------------+
Create a Java UDAF by using external resources
The OBServer system manages the dependencies of external stored procedures, including Java UDAF jars. You must use the DBMS_JAVA.LOADJAVA system package function to upload the jar package to the OBServer system. Then, you can refer to this external resource in the created Java UDAF.
Follow these steps:
To upload the package to the service, perform the following steps:
Call the `DBMS_JAVA.LOADJAVA` procedure to upload the jar file as an external resource to the OBServer.
Create a Java UDAF.
Run the following SQL statement to create a Java UDAF named
my_avg.CREATE FUNCTION my_avg(x int, y int) RETURNS int PROPERTIES ( symbol = 'my.test.MyAvg', type = 'UDAFJar', file = 'my_avg_jar' );- The following table lists the parameter description in the
DELETE_SCHEMA_STATSprocedure.
symbol: specifies the name of the class of the specified entry.type: specifies the type of the external UDF, which supportsodpsjar,UDAFJar,UDTFJar, andPython.file: the external resource name corresponding to the Jar package.
- The following table lists the parameter description in the
After you create it, you can use the Java UDAF just like the PL UDF. The following example shows how to do this.
obclient> SELECT my_avg(b) FROM t GROUP BY a;
+--------------+
|my_avg(b) |
+--------------+
|15.0 |
+--------------+
|150.0 |
+--------------+
|0.0 |
+--------------+
Python UDF
A Python user-defined function (UDF) is a UDF implemented in the Python language.
When you create a Python UDF in OceanBase Database, you only need to specify the Python script and the entry class. During execution, the OBServer instance will instantiate the entry class and call its evaluate method.
Here is an example of creating a Python UDF in the varcharUrl.py script:e
#!/usr/bin/env python3
# -*- coding:utf-8 -*-
"""
# @File : varcharUrl.py
"""
class varcharUrl(object):
def evaluate(self, c1, c2):
if c1 == None:
return c2
if c2 == None:
return c1
return c1 + c2
The sample script varcharUrl.py follows the style of OceanBase Database, but you can also customize the script in the style of ODPS. Therefore, you can directly move an ODPS UDF Python script to OceanBase Database. Here is a sample script:
#!/usr/bin/env python3e
# -*- coding:utf-8 -*-
"""
# @File : varcharUrl_ODPS.py
"""
from odps.udf import annotate
@annotate("string, string -> string")
class varcharUrl(object):
def evaluate(self, c1, c2):
if c1 == None:
return c2
if c2 == None:
return c1
return c1 + c2
Creating a Python UDF
To support the flexibility of using Python UDFs, OceanBase provides two methods for creating Python UDFs: using a URL and using external resources. When a Python UDF is created using a URL, the Python script is not directly coupled with the database. Before each execution of an SQL statement containing this UDF, the latest version of the script is fetched from the specified URL. This method is ideal for frequent script updates, such as during debugging, but may introduce additional performance overhead. Alternatively, a Python UDF created from an external resource stores the script internally within the database. OceanBase then leverages its distributed capabilities to propagate the script to each node. This approach is better suited for stable environments where performance is a concern.
Create a Python UDF by using URL
Execute the following SQL statement to create a Python UDF named varchar_test.
obclient> CREATE FUNCTION varchar_test(c1 varchar(1000), c2 varchar(1000))
RETURNS varchar(1000)
PROPERTIES (
symbol = 'varcharUrl',
type = 'Python',
file = '<URL to Python>'
);
The following table lists the parameters of the DELETE_SCHEMA_STATS procedure:
symbol: specifies the entry class name.Note
symbolis case-sensitive and supports two formats,class_nameandmodule_name.class_name. If you defineclass_name, OceanBase Database generates a uniquemodule_nameto ensure the uniqueness of each Python class. If you usemodule_name.class_name, you must ensure the uniqueness of the Python class. Otherwise, when two different UDFs are called in the same statement, if the values ofmodule_name.class_namein the definitions of the two UDFs are the same, only one Python class will take effect.type: The type of the UDF. Valid values:odpsjar,UDAFJar,UDTFJar, andPython.file: The URL where the Python script is located.
You can use the Python UDF just like a PL/SQL UDF. Here is a usage example:
obclient> CREATE FUNCTION varchar_test(c1 varchar(1000), c2 varchar(1000))
-> RETURNS varchar(1000)
-> PROPERTIES (
-> symbol = 'varcharUrl',
-> type = 'Python',
-> file = 'http://*****/varcharUrl.py'
-> );
Query OK, 0 rows affected (0.289 sec)
obclient> SELECT varchar_test('OceanBase', 'OceanBase Database');
+----------------------------------------------+
| varchar_test('OceanBase', 'OceanBase') |
+----------------------------------------------+
| OceanBase Database |
+----------------------------------------------+
Create a Python UDF using an external resource
OBServer manages dependencies such as Python scripts for external stored procedures. To upload a Python script to OBServer, you must use the DBMS_PYTHON.LOADPYHON system package function. You can then reference this Python script as an external resource in the Python UDF that you create.
There are no strict rules for the order of external resource uploads and UDF creation, as long as the resource is uploaded before the UDF calls it. Additionally, one external resource can be used by multiple UDFs.
The following steps describe the procedure:
Upload the Python script.
You can call the
DBMS_PYTHON.LOADPYHONprocedure to upload a Python script as an external resource to an OBServer. Here is an example of a script file:String url = "<URL to Python>"; InputStream is = new URL(url).openStream(); // conn is a connection to OceanBase PreparedStatement ps = conn.prepareStatement("call DBMS_PYTHON.LOADPYHON(? ,? ,? )"); ps.setString(1, "varchar_test_script"); ps.setBlob(2, is); ps.setString(3, "varcharUrl Python script"); ps.execute();The DBA user can query the
DBA_OB_EXTERNAL_RESOURCESview in the current tenant to obtain all the external resources of the tenant or log in to the SYS tenant to query theCDB_OB_EXTERNAL_RESOURCESview for all the tenants.Create a Python UDF.
Execute the following SQL statement to create a Python UDF named
varchar_test:obclient> CREATE FUNCTION varchar_test(c1 varchar(1000), c2 varchar(1000)) RETURNS varchar(1000) PROPERTIES ( symbol = 'varcharUrl', type = 'Python', file = 'varchar_test_script' );Where:
symbolspecifies the name of the class that contains the specified entry.typespecifies the data type in Python, which currently supports only Python types.file: The name of the external resource corresponding to the Python script.
Once the UDF is created, you can use it just like a PL/SQL function, as follows:
obclient> CREATE FUNCTION varchar_test(c1 varchar(1000), c2 varchar(1000))
-> RETURNS varchar(1000)
-> PROPERTIES (
-> symbol = 'varcharUrl',
-> type = 'Python',
-> file = 'varchar_test_script'
-> );
Query OK, 0 rows affected (0.221 sec)
obclient> SELECT varchar_test('OceanBase', 'Hybase');
+----------------------------------------------+
| varchar_test('OceanBase', 'Haiyang Database') |
+----------------------------------------------+
| OceanBase OceanBaseDatabase |
+----------------------------------------------+
1 row in set (0.060 sec)