On the Models page of the AI services, you can browse and manage models in the model market, configure default models, manage data source model registration, and view API and SQL call examples for models. With these capabilities, you can enable applications to directly call models or leverage model capabilities through the AI functions in the database.
Features
The AI models page provides the following capabilities:
Browse the model market: View the list of currently available models, filter models by provider or type, and enable or disable specific models.
Compare model performance: Compare the 7-day performance trends and 24-hour availability of different models to assist in model selection.
Configure default models: On the Basic Configuration tab, select default models for scenarios such as text generation, text embedding, text reordering, multimodal embedding, and multimodal reordering.
manage data source model registration: After enabling this feature, the system will automatically handle database AI model registration, facilitating direct model call from the database side.
View model call examples: Supports generating API call examples and SQL call examples to help you quickly complete integration and verification.
Rate limiting: On the Throttling tab, set the upper limits for TPM and RPM for different model types.
Prerequisites
You have activated the AI services and entered its page.
You have created a usable API Key. For specific operations, see Manage AI API keys.
If you need to call models via SQL in the database, log in to the database with a database account that has the
ACCESS AI MODELprivilege. For more information, see Enable the AI function built-in AI models.
Model market
Go to the Models page of the AI service. The Model Market tab is displayed by default.
At the top of the tab, you can use the following capabilities to filter and view models:
ActionDescriptionSearch models or providers Search by model name or provider name. Provider Filter by model provider, for example, Volcano Engine or Alibaba Cloud Bailian. Type Filter by model capability type, for example, text generation, video generation, image generation (async only). Compare model performance Open the Compare Model Performance panel to compare the performance of selected models in two dimensions: Average Throughput (tokens/s) and Overall Score (0–5) using scatter plots. You can filter by Provider or Model. Different colors distinguish providers, making it easier to balance output speed and comprehensive capabilities when selecting a model. Models are displayed as cards, each typically containing the following information:
ItemDescriptionModel Name The unique identifier of the model, which must be specified when invoking it. Capability Type The task types supported by the model, for example, text generation, video generation, image generation (only async). Provider The cloud service provider to which the model belongs. 7-Day Performance The performance trend chart for the last 7 days. 24h Availability The availability metric for the last 24 hours. Overall Score The overall performance score of the model (0–5), assessed based on multiple dimensions such as accuracy, response speed, and stability. Enable/Disable Switch Controls whether to enable this model for the current project. After disabled, the model cannot be called. The model types currently supported in the model market include but are not limited to:
TypeDescriptionText generation Used for dialogue generation, content creation, question answering, code generation, and other scenarios. Video generation Used for text-to-video, image-to-video, and video editing scenarios. Image generation (asynchronous only) Used for image generation and image editing scenarios, supporting only asynchronous calls. For detailed descriptions of each model, see Built-in models.
Basic settings
On the Models page, switch to the Basic Settings tab.
In the Base Model section, select the default model for each task as needed. For more information, see Built-in models.
Default modelDescriptionText generation Used for scenarios such as dialog generation, content creation, and question-answering. Text embedding Converts text into vectors, suitable for semantic search and similarity calculations. Text reranking Ranks the retrieved results by relevance to improve search quality. Multi-modal embedding Converts multi-modal content such as images and text into vectors, suitable for cross-modal search. Multi-modal reranking Ranks the retrieved multi-modal results to enhance the effectiveness of multi-modal search. After saving the configuration, subsequent calls will prioritize the default models you set.
Enhanced model calling capabilities:
Auto-Routing Policy: Dynamically routes and load balances requests based on request context, model capabilities, and predefined business rules, ensuring service quality while balancing cost, latency, and stability.
Fall Back to Default Model: When it is enabled, if the specified model is unavailable, an exception occurs, or configurations change, the system automatically falls back to the default model, improving call continuity.
You can view the information of the current available base model providers in the Model Providers section. The page typically displays provider names, model sources, and model counts in card format.
Rate limits & quotas
Switch to the Rate Limits & Quotas tab on the Models page to view and modify the rate limits for each model type.
Project admins and project owners can set model rate limits for the current project. The upper limit of the model rate depends on the organizational rate limit. To request a higher quota, contact the organization admin to increase the organizational rate limit.
The default rate limits for each model type are as follows:
Model type |
Token per minute (TPM) |
Requests per minute (RPM) |
|---|---|---|
| Text generation | 100,000 | 10 |
| Text embedding | 100,000 | 10 |
| Text reordering | 100,000 | 10 |
| Multimodal embedding | 100,000 | 10 |
| Multimodal reordering | 100,000 | 10 |
Modify the rate limit
You can modify the rate limits for each model type. The procedure is as follows:
In the rate limit list on the Rate Limits & Quotas tab, find the model type you want to modify.
Click ··· in the Actions column of the corresponding row. In the dialog box that appears, adjust the values of Tokens per Minute (TPM) and Requests per Minute (RPM). The modified values must not exceed the upper limit of the organization-level rate limit.
Click the checkmark icon to complete the modification.
Data source model registration
Data source model registration controls database-side model registration capabilities. When enabled, the system automatically handles the capabilities required for database model registration. When disabled, you can still manually generate SQL and register it on the database side.
If your main use case involves calling models through SQL and AI functions, it is recommended to enable this option to simplify the database integration process.
Procedure
In the upper-right corner of the Models page, click Data Source Model Registration.
In the pop-up window, complete the following configuration:
Select Instance: Supported only for instances of version V4.4.2 and later. Select the target instance for registering the AI model.
Select Tenant: Select a tenant under the target instance.
In the Models Callable by AI Functions section, select the model source:
Default Model: Directly select the already configured default model.
Custom Models: Specify a model from the list as needed.
Select API Key: Select the API Key used for authentication. To create a new API Key, click Create API Key.
Click Generate SQL to obtain the SQL statement for registering the AI model.
For actual calls, you must log in to the database with a database account that has the ACCESS AI MODEL privilege to call the AI model.
The generated SQL can be used to complete the model registration in the target database.
View model call examples
In the upper-right corner of the Models page, click Model Call Examples to view API and SQL model call examples.
Call a model via API
On the Call Model via API tab, the page displays an example of calling a model through an HTTP interface. The process is as follows:
Create an API Key for authentication.
Use the API Key to call the model through an HTTP request.
For more information, see AI APIs.
curl -X POST https://ai-api-g.oceanbase.com/api/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model": "qwen3-max", "messages": [{"role": "user", "content": "Hello"}]}'
Call a model via SQL
On the Call Model via SQL tab, the page displays an example of calling a model in the database using the AI function. You can call the model in SQL as follows:
SELECT AI_COMPLETE("ob_complete","How are you") AS ans;
SELECT AI_EMBEDDING("ob_embed","I am OceanBase Cloud AI") AS embedding;
SELECT AI_RERANK("ob_rerank","Apple",["apple","banana","fruit","vegetable"]) AS rerank_result;
Rate limits & quotas
Project admins and project owners can set model-specific rate limits for the current project. The upper limit of a project's rate limit depends on the organization-level rate limit. To request a higher quota, contact the organization admin to increase the organizational rate limit.
About rate limits & quotas
- The upper limit of the organization's model rate will serve as the maximum restriction applied to all projects within the organization.
- API usage is limited by Tokens per Minute (TPM) and Requests per Minute (RPM). If the number of tokens or requests exceeds this limit in any given minute, subsequent requests that same minute will be throttled to ensure fair resource usage and system stability.
The default rate limits & quota for each model type are as follows:
Model type |
Monthly quota (M tokens) |
Token per minute (TPM) |
Requests per minute (RPM) |
|---|---|---|---|
| Text generation | 1,000,000,000 | 100,000 | 10 |
| Text embedding | 1,000,000,000 | 100,000 | 10 |
| Text reordering | 1,000,000,000 | 100,000 | 10 |
| Multimodal embedding | 1,000,000,000 | 100,000 | 10 |
| Multimodal reordering | 1,000,000,000 | 100,000 | 10 |
| Omni-modal | 1,000,000,000 | 100,000 | 10 |
| Image generation | - | - | 10 |
| Video generation | - | - | 10 |
Modify rate limits & quotas
You can modify the rate limits & quota for each model type. The procedure is as follows:
On the Models page, switch to the Rate Limits & Quotas tab and locate the model type you want to modify in the list.
Click ··· in the Actions column of the corresponding row. In the pop-up window, adjust the values for Monthly Quota (M Tokens), Tokens per Minute (TPM), and Requests per Minute (RPM). The modified values must not exceed the upper limit of the organization-level rate limit.
Click the checkmark icon to complete the modification.
