Character data types store characters, such as digits and letters, in strings. This topic describes the most commonly used character data types.
The byte sequence in a string is encoded by using a specific coding scheme, which is also called a character set. You can specify a character set, such as UTF-8 or GBK, for a database when you create a tenant.
Character data types are usually defined by length semantics, byte semantics, and character semantics. The length semantics specify how to measure the length of a string, whether by bytes or characters. The byte semantics treat a string as a sequence of bytes. This is the default option for character data types. The character semantics treat a string as a sequence of characters. A character in a character sequence corresponds to a code point in a database character set.
VARCHAR2 and CHAR types
The most common character data type in a database is VARCHAR2, which is the most efficient type for storing character data. The VARCHAR2 type stores variable-length string constants, such as 'LILA', 'St. George Island', and '101'. In contrast to numeric constants such as 5001, string constants are enclosed in single quotation marks (' ') so that the database can distinguish string constants from numeric constants.
When you create a table with a VARCHAR2 column, you can specify the maximum string length that the VARCHAR2 column supports. For example, VARCHAR2(25) indicates that a value stored in the column can be up to 25 bytes in length.
For each row, the database stores a variable-length value in the column. If the value length exceeds the maximum string length defined for the column, the database returns an error. For example, if you insert 10 characters into a UTF-8-encoded column that supports a maximum string length of 25 characters, the column stores 10 characters instead of 25 characters. This way, the VARCHAR2 type reduces the consumption of storage space. In contrast, the CHAR type stores fixed-length strings. The default length is 1 byte. If the actual length of a value is less than the specified length, the database uses spaces to pad the value to the specified length.
OceanBase Database in Oracle mode compares VARCHAR2 values without padding spaces and compares CHAR values with the spaces padded.
NCHAR and NVARCHAR2 types
The NCHAR and NVARCHAR2 types store Unicode characters.
Unicode is a universal character set that can store characters of any language. The NCHAR type stores fixed-length strings, whereas the NVARCHAR2 type stores variable-length strings. These types use a national character set, which can be set when you create a tenant. A national character set stores local special characters in your region. For uniformity of standards, the national character set must be a Unicode character set, for example, AL16UTF16 or UTF-8.
When you create a table with an NCHAR or NVARCHAR2 column, the column conforms to character semantics.