OceanBase Database now supports connecting to a file system as an external data source using the Java SDK. This feature uses the JNI framework, so the OceanBase Database environment needs to have the Java SDK deployed.
Environment configuration
Notice
If you want to use the HDFS/ODPS external table feature and OceanBase Database is deployed in a distributed manner, you need to configure the Java environment and install the dependencies on all corresponding nodes. Do not configure only one node.
All environment configuration and dependency installation operations must be performed as the admin user. To switch to the admin user, run the following command:
su - admin
Deploy and configure the Java environment
Download the OpenJDK installation package from Download address.
Notice
Use the latest version of OpenJDK 11.
This example uses OpenJDK 11.0.29+7. The specific operation steps are as follows:
[admin@xxx /home/admin/rpm]# wget https://builds.openlogic.com/downloadJDK/openlogic-openjdk/11.0.29+7/openlogic-openjdk-11.0.29+7-linux-x64.tar.gzDecompress the installation package.
Here is an example:
[admin@xxx /home/admin/rpm]# sudo tar -zxvf openlogic-openjdk-11.0.29+7-linux-x64.tar.gzThe installation path is as follows:
/home/admin/rpm/openlogic-openjdk-11.0.29+7-linux-x64Notice
This path is used to configure the Java home directory on the OBServer node where the current OBServer is running. That is, the value of the ob_java_home parameter.
Install dependencies
To use the HDFS/ODPS external table feature of OceanBase Database, you need to install the following components:
devdeps-hdfs-sdk RPM package:
This package contains dynamic link library files required for the runtime of HDFS/ODPS external tables. It provides the core interfaces for JVM to interact with JNI (Java Native Interface). This component serves as a communication bridge between the Java Virtual Machine and the local external table Java SDK, ensuring the stable operation of the external table feature.
devdeps-java-extensions RPM package:
This package integrates the core dependency libraries (JAR files) required for HDFS external tables and other external data sources (such as ODPS). This extension package contains a complete Java runtime dependency chain, ensuring the compatibility and performance optimization of the external table feature in distributed scenarios.
Deploy and configure the HDFS.so dynamic library
Obtain the devdeps-hdfs-sdk RPM installation package.
- For Enterprise Edition users, contact Technical Support to obtain the devdeps-hdfs-sdk RPM installation package.
- For Community Edition users, click the
development-kit/directory on the OceanBase Database image page to enter the development tool resources directory and download the devdeps-hdfs-sdk RPM installation package.
After obtaining the installation package, install it by running the following command:
sudo rpm -Uvh devdeps-hdfs-sdk-3.3.6-xxxxx.xxx.xxx.rpmHere is an example:
sudo rpm -Uvh devdeps-hdfs-sdk-3.3.6-112024123116.el7.x86_64.rpmCheck whether the installation meets your expectations.
The
libhdfs.soandlibhdfs.so.0.0.0files must exist, and the corresponding soft links must be normal.Here is an example:
$ll /usr/local/oceanbase/deps/devel/lib total 376 lrwxrwxrwx 1 root root 16 Dec 24 19:49 libhdfs.so -> libhdfs.so.0.0.0 -rwxr-xr-x 1 root root 384632 Dec 24 19:09 libhdfs.so.0.0.0
Deploy and configure the jar package path
Obtain the devdeps-java-extensions RPM installation package.
For Enterprise Edition users, contact Technical Support to obtain the devdeps-java-extensions RPM installation package.
For Community Edition users, click the
development-kit/directory on the OceanBase Database image page to enter the development tool resources directory and download the devdeps-java-extensions RPM installation package.Notice
- For OceanBase Database V4.3.5 BP1 and earlier: Download the devdeps-java-extensions RPM installation package of version 1.0.0.
- For OceanBase Database V4.3.5 BP2 and later: Download the devdeps-java-extensions RPM installation package of version 1.0.1.
- For OceanBase Database V4.4.0: Download the devdeps-java-extensions RPM installation package of version 1.0.1.
- For OceanBase Database V4.4.1: Download the devdeps-java-extensions RPM installation package of version 1.0.2.
- For OceanBase Database V4.4.2 and later: Download the devdeps-java-extensions RPM installation package of version 1.0.4.
After obtaining the installation package, install it by running the following command:
sudo rpm -Uvh devdeps-java-extensions-x.x.x-xxxxxxxxxxxx.xxx.xxxxxx.rpm --prefix=/user_install_directoryHere is an example:
sudo rpm -Uvh devdeps-java-extensions-1.0.0-122025032514.el7.x86_64.rpm --prefix=/home/admin/oceanbaseCheck whether the installation meets your expectations.
Check whether the
oceanbase-odps-connector-jar-with-dependencies.jarfile exists in the/home/admin/oceanbase/jni_packages/v1.0.0directory.Here is an example:
$ll /home/admin/oceanbase/jni_packages/v1.0.0 total 52756 drwxr-sr-x 4 root root 4096 Dec 24 20:25 hadoop drwxr-xr-x 3 root root 4096 Dec 24 20:25 lib -rw-r--r-- 1 root root 54008720 Dec 24 19:52 oceanbase-odps-connector-jar-with-dependencies.jarNotice
This path is used to configure the path of the executable dependency jar package that can be loaded by the JVM when OBServer is started. That is, the value of the ob_java_connector_path parameter.
(Optional) Restart the observer process
Note
- If you are using the Java SDK for the first time, you do not need to restart the observer process.
- Currently, the JNI-related configurations supported by OBServer cannot be flexibly set and take effect immediately. Therefore, if you need to change related Java environment variables, you must restart the observer process for the changes to take effect.
When using the HDFS/ODPS external table feature, you need to configure the corresponding OBServer server. The configuration steps are as follows:
Notice
All the following configurations are cluster-wide and need to be set only once, without the need to configure each node separately.
Step 1: Set the configuration items related to the Java environment
Note
The following settings are performed in the sys tenant.
Enable the Java environment for accessing the SDK (Java SDK) of the external table.
Here is an example:
ALTER SYSTEM SET ob_enable_java_env = true;For more information, see ob_enable_java_env.
Set the Java home directory on the node where the current OBServer is running.
Note
This path is obtained from the installation path of OpenJDK.
Here is an example:
ALTER SYSTEM SET ob_java_home = "/home/admin/rpm/openlogic-openjdk-11.0.29+7-linux-x64";For more information, see ob_java_home.
Set the path of the executable dependency jar package that can be loaded by the JVM when OBServer is started.
Note
This path is obtained from the installation path of the jar package RPM.
Here is an example:
ALTER SYSTEM SET ob_java_connector_path = "/home/admin/oceanbase/jni_packages/v1.0.0";For more information, see ob_java_connector_path.
Set the configuration items related to the Java environment startup.
Create the corresponding log folder path.
mkdir -p /home/user/jvmlogs mkdir -p /home/user/jvmlogs/heapdumpsSet the JVM startup configuration for Java runtime.
Here is an example:
ALTER SYSTEM SET ob_java_opts="-Xmx2048m -Xms2048m -XX:-CriticalJNINatives -Djdk.lang.processReaperUseDefaultStackSize=true -Xrs -XX:+HeapDumpOnOutOfMemoryError -Xloggc:/home/admin/gc.log -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/admin/heapdumps/ -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSClassUnloadingEnabled -XX:+TieredCompilation -XX:+ExplicitGCInvokesConcurrentAndUnloadsClasses ";For more information, see ob_java_opts.
Note
- Changing this configuration item requires restarting the observer process. Since the current HDFS integration uses memory copy, which directly copies HDFS data streams to the C++ memory heap, you can reduce the
-Xmx2048m -Xms2048msetting. - GC log files are generated only when the configuration folder path exists. If the path does not exist, no GC log files will be generated.
- Changing this configuration item requires restarting the observer process. Since the current HDFS integration uses memory copy, which directly copies HDFS data streams to the C++ memory heap, you can reduce the
(Optional) Set the configuration item for using the Java SDK for ODPS.
Note
The HDFS Java SDK does not require this configuration item, while other configuration items are the same.
If you want to use the Java SDK, set
_use_odps_jni_connectortotrue.ALTER SYSTEM SET _use_odps_jni_connector = true;
Step 2 (Optional): Decompress the hdfs-sdk package without sudo privileges
If you cannot obtain
sudoprivileges to execute the installation command, you can manually decompress the devdeps-hdfs-sdk RPM installation package using therpm2cpiocommand. You can place the required files in the desired path as needed.Here is an example:
rpm2cpio devdeps-hdfs-sdk-3.3.6-xxxxx.xxx.xxx.rpm | cpio -idmvSpecify the target path.
After decompressing the installation package, you can use the
mvcommand to move the decompressed files to a custom path accessible by the user (for example,~/hdfs_lib).Here is an example:
mv ./usr/local/oceanbase/deps/devel/lib/* ~/hdfs_lib/Obtain and confirm the absolute path.
You can use the
realpathcommand to view the absolute path of the custom path.Here is an example:
realpath ~/hdfs_libThe returned result is as follows:
/home/${user_name}/hdfs_libConfigure the OceanBase Database path variable.
Here is an example:
Log in to the sys tenant and execute the following command to register the custom path to the system configuration.
Note
- When executing the following statement, replace
${user_name}in the example with the actual path. - The path must be an absolute path, and multiple paths should be separated by colons
:without spaces.
ALTER SYSTEM SET _ob_additional_lib_path = '/home/${user_name}/hdfs_lib';- When executing the following statement, replace
Step 3: Restart the observer process
Stop the observer process on all machines and then restart the observer process.
Switch to the
adminuser.[admin@xxx /home/admin/oceanbase/etc]# su - adminStop the observer process.
-bash-4.2$ kill -9 `pidof observer`Restart the observer process.
-bash-4.2$ cd /home/admin/oceanbase && /home/admin/oceanbase/bin/observerNote
When restarting the observer process, you do not need to specify the startup parameters because the previous startup parameters have already been written to the parameter file.
References
For more information about how to create an ODPS external table, see Create an external table (MySQL mode) or Create an external table (Oracle mode).
