Java UDF
A Java UDF is a user-defined function implemented in Java. By supporting Java UDFs, you can integrate a large number of Java ecosystem products with OceanBase Database, thereby improving the development efficiency of UDFs.
Construct a Java UDF
You can compile and package the following code into a JAR file named my_add.jar. Then, you can create a Java UDF in an OBServer node, as shown in the following example:
package org.example;
public class MyAdd {
public Integer evaluate(Integer a, Integer b) {
if (a == null || b == null) {
return null;
}
return a + b;
}
}
Create a Java UDF
To adapt to the flexible scenarios of Java UDFs, we provide two methods for creating Java UDFs: creating them by using a URL and by using external resources.
Create a Java UDF by using a URL
Execute the following SQL statement to create a Java UDF named my_add.
CREATE FUNCTION my_add(x int, y int)
RETURNS int
PROPERTIES (
symbol = 'org.example.MyAdd',
type = 'odpsjar',
file = '<URL to Jar>'
);
where:
symbol: specifies the entry class name.type: specifies the type of the external UDF. Valid values:odpsjar,UDAFJar,UDTFJar, andPython.file: specifies the URL of the JAR file.
After the Java UDF is created, you can use it in the same way as a PL/SQL UDF. Here is an example:
obclient> CREATE FUNCTION my_add(x int, y int)
-> RETURNS int
-> PROPERTIES (
-> symbol = 'org.example.MyAdd',
-> type = 'odpsjar',
-> file = 'http://******/my_add.jar'
-> );
Query OK, 0 rows affected (0.113 sec)
obclient> SELECT my_add(1, 2);
+--------------+
| my_add(1, 2) |
+--------------+
| 3 |
+--------------+
1 row in set (0.021 sec)
Create a Java UDF by using external resources
OBServer manages the external resources of external stored procedures, such as Java UDF JAR files. You need to upload the JAR file to an OBServer node by using the DBMS_JAVA.LOADJAVA system package function, and then reference the external resource corresponding to the JAR file in the created Java UDF.
There is no strict order for uploading external resources and creating UDFs. As long as the resources are uploaded before the UDF is called, it is acceptable. Additionally, a single external resource can be used by multiple UDFs.
The steps are as follows:
Upload the JAR file
Call the
DBMS_JAVA.LOADJAVAfunction to upload the JAR file as an external resource to an OBServer node.String url = "<URL to Jar>"; InputStream is = new URL(url).openStream(); // conn is a connection to OceanBase PreparedStatement ps = conn.prepareStatement("call dbms_java.loadjava(? ,? ,? )"); ps.setString(1, "my_add_jar"); ps.setBlob(2, is); ps.setString(3, "my add UDF jar"); ps.execute();Create a Java UDF
Execute the following SQL statement to create a Java UDF named
my_add.CREATE FUNCTION my_add(x int, y int) RETURNS int PROPERTIES ( symbol = 'org.example.MyAdd', type = 'odpsjar', file = 'my_add_jar' );where:
symbol: specifies the entry class name.type: specifies the type of the external UDF. Valid values:odpsjar,UDAFJar,UDTFJar, andPython.file: specifies the name of the external resource corresponding to the JAR file.
After the Java UDF is created, you can use it in the same way as a PL/SQL UDF. Here is an example:
obclient> CREATE FUNCTION my_add(x int, y int)
-> RETURNS int
-> PROPERTIES (
-> symbol = 'org.example.MyAdd',
-> type = 'odpsjar',
-> file = 'my_add_jar'
-> );
Query OK, 0 rows affected (0.118 sec)
obclient> SELECT my_add(1, 2);
+--------------+
| my_add(1, 2) |
+--------------+
| 3 |
+--------------+
1 row in set (0.005 sec)
Java UDTF
A Java UDTF is a user-defined table function implemented in Java. The biggest difference between a UDTF and a general UDF is that a general UDF returns a single value, which corresponds to one row of data, while a UDTF can return multiple rows of data.
Create a Java UDTF
To adapt to the flexible scenarios of Java UDTFs, we provide two methods for creating Java UDTFs: creating them by using a URL and by using external resources.
Create a Java UDTF by using a URL
You can compile and package the following code into a JAR file named my_split.jar. Then, you can create a Java UDTF in an OBServer node, as shown in the following example:
package my.test;
public class MySplit {
public String[] process(String str) {
if (str == null) return null;
return str.split(" ");
}
}
Create a Java UDTF
CREATE FUNCTION my_split(x longtext)
RETURNS varchar(1024)
PROPERTIES (
symbol = 'my.test.MySplit',
type = 'UDTFJar',
file = 'http://******/my_split.jar'
);
When you create an external function, you can specify type as UDTFJar to create a Java UDTF. After the Java UDTF is created, you can use it in SQL statements with the table function clause. You can use it in single-row queries or join it with other tables using the join clause.
Call a Java UDTF
Here is an example:
Create a table named
t1and insert some data into the table.obclient> CREATE TABLE t1(a int, b decimal(10, 2), c1 varchar(32));obclient> INSERT INTO t1 VALUES (1, 2.1, 'hello oceanbase'), (2, 2.2, 'hello UDTF') ;Execute the following command to call the Java UDTF.
obclient> SELECT t1.a,t1.b, COLUMN_VALUE FROM t1, table(my_split(t1.c1)); +------+------+--------------+ | a | b | COLUMN_VALUE | +------+------+--------------+ | 1 | 2.10 | hello | | 1 | 2.10 | oceanbase | | 2 | 2.20 | hello | | 2 | 2.20 | UDTF | +------+------+--------------+
Create a Java UDTF by using external resources
OBServer manages the external resources of external stored procedures, such as Java UDTF JAR files. You need to upload the JAR file to an OBServer node by using the DBMS_JAVA.LOADJAVA system package function, and then reference the external resource corresponding to the JAR file in the created Java UDTF.
The steps are as follows:
Upload the JAR file
Call the
DBMS_JAVA.LOADJAVAfunction to upload the JAR file as an external resource to an OBServer node.Create a Java UDTF
Execute the following SQL statement to create a Java UDTF named
my_split.CREATE FUNCTION my_split(x int, y int) RETURNS int PROPERTIES ( symbol = 'my.test.MySplit', type = 'UDTFJar', file = 'my_split_jar' );where:
symbol: specifies the entry class name.type: specifies the type of the external UDF. Valid values:odpsjar,UDAFJar,UDTFJar, andPython.file: specifies the name of the external resource corresponding to the JAR file.
After the Java UDTF is created, you can use it in the same way as a PL/SQL UDTF. Here is an example:
obclient> SELECT t1.a,t1.b, COLUMN_VALUE FROM t1, table(my_split(t1.c1));
+------+------+--------------+
| a | b | COLUMN_VALUE |
+------+------+--------------+
| 1 | 2.10 | hello |
| 1 | 2.10 | oceanbase |
| 2 | 2.20 | hello |
| 2 | 2.20 | UDTF |
+------+------+--------------+
Java UDAF
Java UDAFs are user-defined aggregate functions implemented in the Java language. User-defined aggregate functions (UDAFs) aggregate columns in multiple rows of data into a scalar value according to user-defined logic. UDAFs are typically used with the GROUP BY statement. For each group, a UDAF returns an aggregated result.
Create a Java UDAF
To provide flexibility for Java UDAFs, two methods are supported: creating them from a URL and from external resources.
Create a Java UDAF using a URL
The following example shows how to compile and package the provided Java code into a my_avg.jar file. You can then create a Java UDAF in an OBServer node.
package my.test;
public class MyAvg {
private double sum = 0;
private double count = 0;
public void iterate(Double x) {
sum += x;
count += 1;
}
public void merge(MyAvg other) {
sum += other.sum;
count += other.count;
}
public Double terminate() {
return sum / count;
}
}
Create a Java UDAF
CREATE FUNCTION my_avg(x double)
RETURNS double
PROPERTIES (
symbol = 'my.test.MyAvg',
type = 'UDAFJar',
file = 'http://******/my_avg.jar'
);
Call a Java UDAF function
Call the procedure as follows:
Create a table named
tand insert some rows.obclient> CREATE TABLE t(a int, b int);obclient> INSERT INTO t VALUES(1, 10),(2, 100),(1, 20),(2, 200),(3, 0);Execute the following command to invoke the Java UDAF.
obclient> SELECT my_avg(b) FROM t GROUP BY a; +--------------+ |my_avg(b) | +--------------+ |15.0 | +--------------+ |150.0 | +--------------+ |0.0 | +--------------+
Create a Java UDAF by using external resources
OBServer manages the dependency resources of external stored procedures, such as the Java UDAF Jar packages, by using external resource management. You need to upload the Jar package by using the DBMS_JAVA.LOADJAVA system package function to the OBServer node, and then reference this Jar package in the UDAF.
Perform the following steps:
Upload the JAR package.
Call
DBMS_JAVA.LOADJAVAto upload a JAR file as an external resource to the ODP.Create a Java UDAF
Run the following SQL statement to create the Java UDAF
my_avg.CREATE FUNCTION my_avg(x int, y int) RETURNS int PROPERTIES ( symbol = 'my.test.MyAvg', type = 'UDAFJar', file = 'my_avg_jar' );Where:
symbol: The name of the specified entry class.typespecifies the type of the external UDF. Valid values:odpsjar,UDAFJar,UDTFJar, andPython.file: The name of the external resource corresponding to the Jar package.
After it is created, the Java UDAF can be used like a PL/SQL UDF, as shown in the following example:
obclient> SELECT my_avg(b) FROM t GROUP BY a;
+--------------+
|my_avg(b) |
+--------------+
|15.0 |
+--------------+
|150.0 |
+--------------+
|0.0 |
+--------------+
Python UDF
A user-defined function (UDF) implemented in Python is called a Python UDF.
OceanBase Database instantiates an entry class and calls its evaluate method at runtime when you create a Python UDF by specifying a Python script and the entry class.
The following example shows how to create a Python UDF named varcharUrl.py:
#!/usr/bin/env python3
# -*- coding:utf-8 -*-
"""
# @File : varcharUrl.py
"""
class varcharUrl(object):
def evaluate(self, c1, c2):
if c1 == None:
return c2
if c2 == None:
return c1
return c1 + c2
The following sample script shows how to convert ODPS UDF Python scripts to OceanBase Database. The script is written in the OceanBase style and also supports the ODPS style:
#!/usr/bin/env python3e
# -*- coding:utf-8 -*-
"""
# @File : varcharUrl_ODPS.py
"""
from odps.udf import annotate
@annotate("string, string -> string")
class varcharUrl(object):
def evaluate(self, c1, c2):
if c1 == None:
return c2
if c2 == None:
return c1
return c1 + c2
Create a Python UDF
OceanBase Database provides two ways to create Python UDFs: using URLs and external resources. Python UDFs created using URLs do not have a coupling between the Python script and the database. When you execute an SQL statement containing a Python UDF, OceanBase Database pulls the latest version of the script from the URL specified for the Python UDF. This way is suitable for scenarios where you frequently need to update the Python script, but it has additional overhead. On the contrary, Python UDFs created using external resources save the script as an external resource in OceanBase Database. OceanBase Database distributes the script to all nodes by leveraging its distributed capabilities. This way is suitable for scenarios where stability and performance are important.
Create a Python UDF using a URL
Execute the following SQL statements to create a Python UDF named varchar_test.
obclient> CREATE FUNCTION varchar_test(c1 varchar(1000), c2 varchar(1000))
RETURNS varchar(1000)
PROPERTIES (
symbol = 'varcharUrl',
type = 'Python',
file = '<URL to Python>'
);
Where:
symbol: Indicates the specified entry class name.Note
The
symboloption is case-sensitive and accepts two formats:class_nameandmodule_name.class_name. If you specifyclass_name, OceanBase Database automatically generates a uniquemodule_nameto ensure the uniqueness of each Python class. However, if you use themodule_name.class_nameformat, you must ensure its uniqueness. Otherwise, when two different UDFs with the samemodule_name.class_nameare called in the same statement, only one of the Python classes will be effective due to the coverage issue.type: indicates the type of the external UDF. Valid values:odpsjar,UDAFJar,UDTFJar, andPython.file: The URL of the Python script.
You can use the Python UDF just like you use a PL/SQL UDF. Here is an example:
obclient> CREATE FUNCTION varchar_test(c1 varchar(1000), c2 varchar(1000))
-> RETURNS varchar(1000)
-> PROPERTIES (
-> symbol = 'varcharUrl',
-> type = 'Python',
-> file = 'http://*****/varcharUrl.py'
-> );
Query OK, 0 rows affected (0.289 sec)
obclient> SELECT varchar_test('OceanBase', 'Haiyang Database');
+----------------------------------------------+
| varchar_test('OceanBase', 'Haiyang Database') |
+----------------------------------------------+
| OceanBase OceanYang database |
+----------------------------------------------+
Create a Python UDF by using external resources
OBServer manages the dependency resources of external stored procedures such as Python scripts by using external resource management. You must upload Python scripts to OBServer by using the DBMS_PYTHON.LOADPYHON system package function and reference the external resource corresponding to the Python script in the Python UDF that is created.
The upload of external resources and the creation of UDFs do not have a strict order requirement. As long as the UDF is called when the resources are already uploaded, it works. Additionally, an external resource can be shared and used by multiple UDFs.
To proceed with the specified steps:
Upload the Python script.
You can call the
DBMS_PYTHON.LOADPYTHONprocedure to upload a Python script as an external resource to the ODS node. Here is an example of the script file:String url = "<URL to Python>"; InputStream is = new URL(url).openStream(); // conn is a connection to OceanBase PreparedStatement ps = conn.prepareStatement("call DBMS_PYTHON.LOADPYHON(? ,? ,? )"); ps.setString(1, "varchar_test_script"); ps.setBlob(2, is); ps.setString(3, "varcharUrl Python script"); ps.execute();After the view is created, a DBA user can query the
DBA_OB_EXTERNAL_RESOURCESview in the current tenant to view all external resources of the current tenant or log in to the SYS tenant and query theCDB_OB_EXTERNAL_RESOURCESview to view all external resources of all tenants.Create a Python UDF.
Execute the following SQL statement to create a Python UDF named
varchar_test:obclient> CREATE FUNCTION varchar_test(c1 varchar(1000), c2 varchar(1000)) RETURNS varchar(1000) PROPERTIES ( symbol = 'varcharUrl', type = 'Python', file = 'varchar_test_script' );The following table describes the syntax and parameters of the
DELETE_SCHEMA_STATSprocedure.symbol: the name of the entry class that you specify.type: the type in Python. It supports only the Python type.file: the name of the external resource corresponding to the Python script.
After you create a Python UDF, you can use it just like a PL/SQL user-defined function (UDF). Here is an example:
obclient> CREATE FUNCTION varchar_test(c1 varchar(1000), c2 varchar(1000))
-> RETURNS varchar(1000)
-> PROPERTIES (
-> symbol = 'varcharUrl',
-> type = 'Python',
-> file = 'varchar_test_script'
-> );
Query OK, 0 rows affected (0.221 sec)
obclient> SELECT varchar_test('OceanBase', 'Haiyang Database');
+----------------------------------------------+
| varchar_test('OceanBase', 'SeaBase') |
+----------------------------------------------+
| OceanBase Database |
+----------------------------------------------+
1 row in set (0.060 sec)
