In OceanBase Database, semi-structured data types such as JSON, GIS, and XML are classified as multi-model data types.
Characteristics of multi-model data
- Standard composition: For example, RFC 8259 is a data interchange specification for JavaScript Object Notation (JSON), while Open Geospatial Consortium (OGC) provides a set of text markup language (WKT-CRS) to describe coordinate systems and inter-system conversions, and World Wide Web Consortium (W3C) defines Extensible Markup Language (XML).
- Direct access based on SQL: As semi-structured data, multi-model data uses basic internal data types that can be mapped to SQL types. Multi-model data adopts a cascading logical tree schema that organizes a series of basic SQL types into complex composite data types.
- Fixed computing functions: Multi-model data types adopt standardized formats, which not only define the data exchange formats but also contribute to the computing APIs.
Query and computing of multi-model data
Multi-model data features a flexible structure. OceanBase Database provides a wide range of functions to support the construction, query, and computing of multi-model data based on the standards of multi-model data.
- Data construction: OceanBase Database provides aggregation functions that extract basic data from relational tables based on certain aggregation rules to construct JSON and XML objects.
- Data query: For JSON and XML data, OceanBase Database allows you to access a subset of multi-model data objects or a specific basic SQL element based on paths.
- Data computing: OceanBase Database allows you to use geographic information system (GIS) functions to compute the relationships between spatial objects.
Storage of multi-model data
As a type of composite data, multi-model data is generally at a variable length, and a multi-model data object can sometimes be quite large. For storage convenience, multi-model data generally inherits the storage characteristics of LOB data. Technically, the storage upper limit of LOB data also applies to multi-model data. However, as the computing and storage of a large multi-model data object is very inefficient, we recommend that you do not construct a large multi-model data object.
- Text or VARCHAR2 type: You can store multi-model data directly as the text or VARCHAR2 type in the database based on the standard text format. The disadvantage of this method is that each query of multi-model data requires parsing. Therefore, we recommend that you do not use this method to store multi-model data.
- Native multi-model type: The format of the stored data is optimized for database parsing. Once multi-model data is stored in the database, it will not be parsed again in the next query, and the storage space is also optimized. We recommend that you use the native multi-model type for storage.
Multi-model data interchange
Multi-model data is interchanged between different systems based on the standard format specifications.
Notice
- Currently, the MySQL mode of OceanBase Database supports only the multi-model data types GIS and JSON.
- In MySQL mode, a JSON-based virtual generation function can create a secondary index on an element within any JSON object.
JSON/XML/GIS data types
JSON data types
The JSON data type is a special data type used to store and process JSON data in a database. JSON values can consist of JSON objects, arrays, strings, numbers, Boolean values (false/true), or null. The values false, true, and null are only allowed in lowercase form.
JSON textual structure
The JSON textual structure includes characters, strings, numbers, and three literal names. Various separators, including spaces, horizontal tabs, line breaks, and carriage returns, are allowed before or after any structural character.
Begin array= left bracket ([)
Begin object = left brace ({)
End array = right bracket (])
End object = right brace (})
Name separator = colon (:)
Value separator = comma (,)
Objects
The structure of an object is represented as a pair of braces that enclose zero or more name/value pairs or members. The names within an object must be unique. A name is a string. Each name is followed by a colon to separate it from its value. A single comma separates a value from a name that follows. A single comma is added to separate two name/value pairs.
Here is an example:
{ "NAME": "SAM", "Height": 175, "Weight": 100, "Registered" : false}
Arrays
The structure of an array is represented as a pair of brackets that enclose zero or more values. A value is also known as an element. Elements in an array are separated by commas (,), and values in the array can be different.
Here is an example:
["abc", 10, null, true, false]
Numbers
Numbers are represented in decimal format and consist of an integer component, which may be optionally preceded by a minus sign (-), followed by an optional fractional part and/or an exponent part. Leading zeros are not allowed. The fractional part consists of a decimal point followed by one or more digits. The exponent part starts with an uppercase or lowercase letter E, followed by an optional plus sign (+) or minus sign (-). After E and the optional sign, one or more digits may follow.
Here is an example:
[100, 0, -100, 100.11, -12.11, 10.22e2, -10.22e2]
Strings
A string starts and ends with a double quotation mark ("). All Unicode characters can be enclosed in double quotation marks, except for the following characters that must be escaped: quotation marks, backslashes, and control characters.
The JSON text must be encoded in UTF-8, UTF-16, or UTF-32. By default, the JSON text is encoded in UTF-8.
Here is an example:
{"Url": "http://www.example.com/image/481989943"}
Create a JSON value
OceanBase Database supports the following DDL operations on the JSON data type:
Create a table with JSON columns.
Add or remove JSON columns.
Create an index on a generated column for indexing the JSON column.
Limitations
You can create multiple JSON columns in each table, but the following limitations apply:
A JSON column cannot be used as the
PRIMARY KEY,FOREIGN KEY, orUNIQUE KEY, but you can add theNOT NULLorCHECKconstraint.A JSON column cannot contain default values.
JSON columns cannot be used as partitioning keys.
The length of the JSON data cannot exceed the value of
LONGTEXT. The maximum depth of each JSON object or array is 99.
Examples
Example 1: Perform basic DDL operations
obclient> CREATE TABLE tbl1 (id INT PRIMARY KEY, docs JSON NOT NULL, docs1 JSON);
Query OK, 0 rows affected
obclient> ALTER TABLE tbl1 MODIFY docs JSON CHECK(docs <'{"a" : 100}');
Query OK, 0 rows affected
obclient> CREATE TABLE json_tab(
id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT COMMENT 'primary key',
json_info JSON COMMENT 'JSON data',
json_id INT GENERATED ALWAYS AS (json_info -> '$.id') COMMENT 'virtual JSON data field',
json_name VARCHAR(5) GENERATED ALWAYS AS (json_info -> '$.NAME'),
index json_info_id_idx (json_id)
)COMMENT 'json sample table';
Query OK, 0 rows affected
obclient> ALTER TABLE json_tab ADD COLUMN json_info1 JSON;
Query OK, 0 rows affected
obclient> ALTER TABLE json_tab ADD INDEX (json_name);
Query OK, 0 rows affected
obclient> ALTER TABLE json_tab drop COLUMN json_info1;
Query OK, 0 rows affected
Example 2: Create an index for the specified key by using a generated column
obclient> CREATE TABLE jn ( c JSON, g INT GENERATED ALWAYS AS (c->"$.id"));
Query OK, 0 rows affected
obclient> CREATE INDEX idx1 ON jn(g);
Query OK, 0 rows affected
Records: 0 Duplicates: 0 Warnings: 0
obclient> INSERT INTO jn (c) VALUES
('{"id": "1", "name": "Fred"}'), ('{"id": "2", "name": "Wilma"}'),
('{"id": "3", "name": "Barney"}'), ('{"id": "4", "name": "Betty"}');
Query OK, 4 rows affected
Records: 4 Duplicates: 0 Warnings: 0
obclient> SELECT c->>"$.name" AS name FROM jn WHERE g <= 2;
+-------+
| name |
+-------+
| Fred |
| Wilma |
+-------+
2 rows in set
obclient> EXPLAIN SELECT c->>"$.name" AS name FROM jn WHERE g <= 2\G
*************************** 1. row ***************************
Query Plan: =========================================
|ID|OPERATOR |NAME |EST. ROWS|COST|
-----------------------------------------
|0 |TABLE SCAN|jemp(idx1)|2 |92 |
=========================================
Outputs & filters:
-------------------------------------
0 - output([JSON_UNQUOTE(JSON_EXTRACT(jemp.c, '$.name'))]), filter(nil),
access([jemp.c]), partitions(p0)
1 row in set
Spatial data types
OceanBase Database supports the construction, storage, and analysis of spatial data types of Geographic Information System (GIS).
The supported spatial data types in OceanBase Database are as follows:
GEOMETRYPOINTLINESTRINGPOLYGONMULTIPOINTMULTILINESTRINGMULTIPOLYGONGEOMETRYCOLLECTION
Single-value types
A single-value type holds a single spatial value, such as:
GEOMETRY, which can hold any type of geometry value.POINT,LINESTRING, andPOLYGON, which are the most basic types and can only hold specific types of geometry values.POINT: represents a point in a coordinate system. For example,POINT(50 60)specifies a longitude value and a latitude value, which are separated by a space.LINESTRING: represents a line that consists of a series of points connected together. For example,LINESTRING(30 20,10 30,50 50)specifies a line consisting of three points separated by commas (,). Each point consists of a longitude value and a latitude value separated by a space, which is consistent with thePOINTformat.POLYGON: represents a simple or solid polygon, which can be open or boundary only, such as a boundary-only polygonPOLYGON((0 0,10 0,10 10,0 10)).
GEOMETRY is the base class of all spatial types, and types such as POINT, LINESTRING, and POLYGON are subclasses of GEOMETRY.
Collection types
These types hold collections of values, such as:
MULTIPOINT,MULTILINESTRING, andMULTIPOLYGON, which hold collections of spatial data of a certain base type.MULTIPOINT: represents a collection of points, such asMULTIPOINT(10 0),(5 10),(25 5),(20 5)andMULTIPOINT(10 0,5 10,25 5,20 5).MULTILINESTRING: represents a collection of lines, such asMULTILINESTRING((0 0,15 5,10 10),(30 20,10 30,40 40)).MULTIPOLYGON: represents a collection of graphs, such asMULTIPOLYGON(((30 20, 45 40, 10 40,30 20)),((15 5,40 10,10 20,5 10,15 5)))andMULTIPOLYGON(((40 40,20 45,45 30,40 40)),((20 35,10 30,10 10,30 5,45 20,20 35),(30 20,20 15,20 25,30 20))).
GEOMETRYCOLLECTIONcan hold a collection of objects of any spatial data type, such asGEOMETRYCOLLECTION(POINT(40 10),LINESTRING(10 10,20 20,10 40),POLYGON((40 40,20 45,45 30,40 40))).
MULTIPOINT, MULTILINESTRING, MULTIPOLYGON, and GEOMETRYCOLLECTION are value collection data types holding vales of the POINT, LINESTRING, or POLYGON type.
SRID attribute
The SRID attribute defines a value in the spatial reference system (SRS) on a column of the spatial type. A column that defines the SRID attribute has the following two characteristics:
- The
SRIDvalue of the spatial object written to the column must be consistent with theSRIDattribute defined for the column. Otherwise, the write operation fails. - The spatial index defined on this column can take effect during spatial queries.
Example
A field of the spatial type supports the nullable attribute and the SRID attribute of a space coordinate system. Here is an example:
obclient [test]> CREATE TABLE tbl1_g(g GEOMETRY NOT NULL SRID 0);
Query OK, 0 rows affected