This topic describes the syntax and parameters of AI functions in OceanBase Database and provides examples of using AI functions.
AI functions integrate AI model capabilities into data processing within the database through SQL expressions. They greatly simplify operations such as data extraction, analysis, summarization, and storage using large AI models. This is a significant new feature in the database and data warehouse fields. In MySQL mode, OceanBase Database provides the DBMS_AI_SERVICE package for managing AI models and endpoints, adds several built-in AI function expressions, and supports monitoring AI model calls through views.
Prerequisites
- You have the required permissions for AI models. For more information, see AI function service permissions.
- If you have not registered an AI model with an endpoint, register an AI model with an endpoint by following the instructions in Register an AI model.
Considerations
CREATE AI MODELandDROP AI MODELoperations are synchronized between primary and standby tenants, butCREATE AI MODEL ENDPOINT,ALTER AI MODEL ENDPOINT, andDROP AI MODEL ENDPOINToperations are not. Therefore, standby tenants need to manually configure AI model endpoints to use AI function services.- Hybrid search relies on the model management and embedding features of AI function services. When deleting an AI model, check if it is referenced by hybrid search to avoid potential issues.
AI function expressions
OceanBase Database supports the following AI function expressions, enabling direct invocation of AI models within OceanBase Database through SQL statements, thus simplifying the calling process:
| Name | Description |
|---|---|
AI_COMPLETE |
Calls a specified text generation large model to process the prompt and data, and parses the processing result. |
AI_PROMPT |
Dynamically constructs a formatted prompt based on a prompt template and supports dynamic data insertion. |
AI_EMBED |
Calls an embedding model to convert text data into vector data. |
AI_RERANK |
Calls a reranking model to sort text based on the prompt. |
Notice
When using AI function expressions, ensure that AI models and endpoint information are registered in the database. For registration steps, see Register AI models. The model keys used in the examples below are ob_complete, ob_embed, and ob_rerank.
AI_COMPLETE and AI_PROMPT
The AI_COMPLETE function specifies a registered large language model (LLM) using the model_key parameter, processes the provided prompt and data, and returns the generated text from the LLM. Users can customize the prompt and data format in the prompt parameter. This approach supports diverse text processing and enables batch processing within the database, effectively avoiding the overhead of data transfer between the database and the LLM.
In many AI applications, prompts often have highly structured characteristics and require dynamic injection of specific data. Manually using functions like CONCAT to concatenate prompts and input content is not only costly but also prone to errors that can lead to formatting issues. To address the need for prompt reuse and dynamic combination with data, OceanBase Database provides the AI_PROMPT function. The AI_PROMPT function upgrades prompts from static text to reusable, parameterized function templates. These templates can be directly used in place of the prompt parameter in AI_COMPLETE, significantly simplifying the prompt construction process and enhancing development efficiency and accuracy.
AI_PROMPT function
The AI_PROMPT function dynamically constructs formatted prompts based on a prompt template, supporting dynamic data insertion.
Syntax
The syntax for the AI_PROMPT function is as follows:
AI_PROMPT('template', expr0 [ , expr1, ... ]);
Parameters:
| Parameter | Description | Type | Nullable |
|---|---|---|---|
| template | The user-provided prompt template. | VARCHAR(max_length) | No |
| expr | The user-provided data. | VARCHAR(max_length) | No |
Both parameters template and expr are required and cannot be null. The expr parameter only supports the VARCHAR type and does not support the JSON type.
Return value:
- The formatted prompt, returned as JSON.
Examples
The AI_PROMPT function organizes the template string and dynamic data into a JSON format:
- The first parameter (template string) is placed in the
templatefield of the returned JSON. - Subsequent parameters (data values expr0, expr1, etc.) are placed in the
argsarray of the returned JSON. - Placeholders in the template, such as
{0}and{1}, correspond to the data in theargsarray by index and are automatically replaced when used in theAI_COMPLETEfunction.
For example:
SELECT AI_PROMPT('Recommend {0} of the most popular {1} to me.', 'ten', 'mobile phones');
The result is as follows:
{
"template": "Recommend {0} of the most popular {1} to me.",
"args": ["ten", "mobile phones"]
}
Based on the previous example, using the AI_PROMPT function in the AI_COMPLETE function:
SELECT AI_COMPLETE("ob_complete",AI_PROMPT('Recommend {0} of the most popular {1} to me.just output name in json array format', 'two', 'mobile phones')) AS ans;
The result is as follows:
+--------------------------------------------------+
| ans |
+--------------------------------------------------+
| ["iPhone 15 Pro Max","Samsung Galaxy S24 Ultra"] |
+--------------------------------------------------+
AI_COMPLETE function
Syntax
The syntax for the AI_COMPLETE function is as follows:
AI_COMPLETE(model_key, prompt[, parameters])
-- If using the AI_PROMPT function, replace the prompt parameter with the AI_PROMPT function, as shown in the AI_PROMPT function example.
AI_COMPLETE(model_key, AI_PROMPT(prompt_template, data))
Parameters:
| Parameter | Description | Type | Nullable |
|---|---|---|---|
| model_key | The registered model in the database. | VARCHAR(128) | No |
| prompt | The user-provided prompt information. | VARCHAR/TEXT(LONGTEXT) | No |
| parameters | Optional parameters supported by the API. These parameters are directly included in the generated message body and may vary by vendor. Common optional parameters include temperature, top_p, and max_tokens. Usually, the default settings are sufficient. |
JSON | Yes |
The model_key and prompt parameters must be specified. If one of them is NULL, the function will return an error.
Return value:
- text, the text generated by the LLM based on the prompt.
Examples
Sentiment analysis example
SELECT AI_COMPLETE("ob_complete","Your task is to perform sentiment analysis on the provided text and determine whether the sentiment is positive or negative. The text to be analyzed is as follows: <text> The weather is really nice. </text> The judgment criteria are as follows: If the text expresses a positive sentiment, output 1; if the text expresses a negative sentiment, output -1. Do not output anything else.\n") AS ans;The result is as follows:
+-----+ | ans | +-----+ | 1 | +-----+Translation example
CREATE TABLE comments ( id INT AUTO_INCREMENT PRIMARY KEY, content TEXT ); INSERT INTO comments (content) VALUES ('hello world!'); -- Replace the processed data with the column names from the database table using a CONCAT expression, enabling batch processing of data within the database without the need to copy data to and from the LLM. SELECT AI_COMPLETE("ob_complete", concat("You are a translation master and need to translate the following English text into Chinese. The text to be translated is as follows:<text>", content, "</text>")) AS ans FROM comments;The result is as follows:
+-------------+ | ans | +-------------+ | Hello, world! | +-------------+Classification example
SELECT AI_COMPLETE("ob_complete","You are a classification master and will receive a set of question texts. You need to categorize these questions into one of the following categories: [\"Hardware Department\",\"Software Department\",\"Other\"]. The text to be analyzed is as follows: <text> The screen quality is really poor. </text>") AS res;The result is as follows:
+--------+ | res | +--------+ | Hardware Department | +--------+
AI_EMBED
The AI_EMBED function specifies an embedding model (Embedding Model) registered in the database using the model_key parameter, and converts the text data provided by the user into vector data. When the model supports multiple dimensions, the dim parameter can be used to specify the output dimensions.
Usage
The syntax is as follows:
AI_EMBED(model_key, input, [dim])
Parameter description:
| Parameter | Description | Type | Nullable |
|---|---|---|---|
| model_key | The model registered in the database. | VARCHAR(128) | No |
| input | The text data to be converted. | VARCHAR | No |
| dim | Specifies the output dimensions of the vector. Some API providers of large models support configuring multiple dimensions. | INT64 | Yes |
If model_key and input are not specified, or if one of them is NULL, the function will return an error.
Return value:
- A string in vector format, representing the vector converted by the embedding model based on the text.
Examples
Embed single-row data
SELECT AI_EMBED("ob_embed","Hello world") AS embedding;The result is as follows:
+----------------+ | embedding | +----------------+ | [0.1, 0.2, 0.3]| +----------------+Embed column data from a table
CREATE TABLE comments ( id INT AUTO_INCREMENT PRIMARY KEY, content TEXT ); INSERT INTO comments (content) VALUES ('hello world!'); SELECT AI_EMBED("ob_embed",content) AS embedding FROM comments;The result is as follows:
+----------------+ | embedding | +----------------+ | [0.1, 0.2, 0.3]| +----------------+
AI_RERANK
The AI_RERANK function specifies a reranking model (Rerank Model) registered in the database using the model_key parameter, and organizes the query and document list provided by the user according to the vendor's rules and sends the message to the specified model. It parses and returns the sorted results returned by the model, suitable for reranking scenarios in RAG.
Usage
The syntax is as follows:
AI_RERANK(model_key, query, documents[, document_key])
Parameter description:
| Parameter | Description | Type | Nullable |
|---|---|---|---|
| model_key | The model registered in the database. | VARCHAR(128) | No |
| query | The text input by the user. | VARCHAR(1024) | No |
| documents | The document list input by the user. | JSON array, for example, '["apple", "banana"]' |
No |
If model_key, query, and documents are not specified, or if one of them is NULL, the function will return an error.
Return value:
- A JSON array containing the documents and their relevance scores returned by the reranking model, sorted in descending order of relevance scores.
Examples
SELECT AI_RERANK("ob_rerank","Apple",'["apple","banana","fruit","vegetable"]');
The result is as follows:
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ai_rerank("ob_rerank","Apple",'["apple","banana","fruit","vegetable"]') |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| [{"index": 0, "document": {"text": "apple"}, "relevance_score": 0.9912109375}, {"index": 1, "document": {"text": "banana"}, "relevance_score": 0.0033512115478515625}, {"index": 2, "document": {"text": "fruit"}, "relevance_score": 0.0003669261932373047}, {"index": 3, "document": {"text": "vegetable"}, "relevance_score": 0.00001996755599975586}] |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
View information about AI models
OceanBase Database allows you to query the information about registered AI models and AI model endpoints from views. For more information, see:
- CDB/DBA_OB_AI_MODELS: Query the information about AI models.
- CDB/DBA_OB_AI_MODEL_ENDPOINTS: Query the information about AI model endpoints.
References
- Register an AI model: The complete command for registering a model with an endpoint.
- Get started with AI Function Service: A guide for first-time users, covering the minimum steps from registration to running your first example.
- Vector embedding technology
- User and permission management in MySQL mode
