Background information
Data desensitization changes the forms of sensitive information, such as names, ID card numbers, mobile phone numbers, landline numbers, bank accounts, and email addresses. You can set data desensitization rules to protect sensitive data.
Concepts
Data desensitization: the process of deidentifying, masking, or replacing sensitive data by using special algorithms and techniques during data processing and storage to protect data security and prevent data breach.
Dynamic desensitization: the process of desensitizing sensitive data in real time. The data is desensitized only for a database query. The source data in the database is not modified. Dynamic desensitization is commonly used in production environments to maintain the integrity and accuracy of raw data while avoiding the risk of data breach. The drawback of dynamic desensitization is its limited processing speed, which may affect the database query efficiency.
Static desensitization: the process of preprocessing sensitive data and storing the processed data in storage media, such as a database. Static desensitization is usually used in testing, development, and demonstration environments. It protects sensitive data from being viewed by unauthorized personnel and avoids the legal liability of data breach. The benefit of static desensitization is the fast processing speed and high query efficiency. However, the raw data is overwritten, so the data accuracy is undermined.
Desensitization algorithm: an algorithm used to desensitize data. A desensitization algorithm can effectively protect the security of sensitive data, avoid information breaches, and retain data formats and structures for data query and usage.
Identification rule: a rule used to automatically identify sensitive data for dynamic desensitization. ODC automatically scans data and identifies sensitive columns based on the added identification rules.
Sensitive column: a column that contains sensitive data in a database table.
Execution process

A project administrator views the built-in desensitization algorithm and tests the desensitization effect in Desensitization Algorithm under Security Specifications.
The project administrator manually adds sensitive columns on the Sensitive Data tab for a project. To add sensitive columns by using the automatic scanning feature, you must first create an identification rule.
A user views the table data, queries data in the SQL window, exports a result set, exports a ticket, and submits a database change ticket. The sensitive columns in the data are desensitized.
Prerequisites
The project administrator and DBA can manage sensitive columns and identification rules.
All users can view and test the desensitization effect. Users cannot create, edit, or delete a desensitization algorithm.
Manage sensitive columns
Add sensitive columns
Assume that you want to desensitize the email and mobile_phone columns in the student table in the odc_test database.
| Parameter | Example value |
|---|---|
| Data Source | mysql_4.2.0 |
| Source Database Name | odc_test |
| Table Name | student |
In the project collaboration window, choose Project > Sensitive Data > Sensitive Columns > Add Sensitive Column > Add Manually.
Add sensitive columns by taking one of the following means, and then click Submit.
Method 1: Manually add sensitive columns.
Method 2: Scan for sensitive columns automatically.
Note
Before automatically scanning for sensitive columns, you must create an identification rule. For more information, see Manage identification rules.
In the Sensitive Data list, you can view and enable the added sensitive columns.
Edit sensitive columns
As shown in the preceding figure, in the Sensitive Data list, click Edit in the Actions column to modify the desensitization algorithm.
Delete sensitive columns
On the Sensitive Data tab, click Delete in the Actions column.
Manage identification rules
Identification rules facilitate the management of sensitive data. After you manually add sensitive columns, you can add custom identification rules to automatically scan for sensitive columns. An identification rule is a matching condition that ODC uses to identify sensitive columns. ODC allows you to define an identification rule in three aspects: path, regular expression, and script.
Path: Use the database name, table name, or column name as the identification object. You can enter a path identification expression as an identification rule. Use periods (.) to separate database and table column names. Use asterisks (*) as wildcards. Use commas (,) to specify multiple rules.
Parameter Required? Description Rule Name Yes The name of the rule. The name can contain up to 64 characters. Rule Status Yes The status of the rule. Valid values: Enabled and Disabled. Matching Rule Yes The rule that qualifies a data column as a sensitive column.
For example, the rule *.*.mobile_phone qualifies any column namedmobile_phonein all databases and tables.Exclusion Rule No The rule that disqualifies a data column as a sensitive column. Notice
ODC checks a data column against the exclusion rule first, and then the matching rule, to determine whether it is a sensitive column.
Desensitization Algorithm Yes The default desensitization algorithm for masking sensitive columns. Description No The description of the identification rule. Regular expression: Use the database name, table name, column name, or column comments as the identification object. You can enter a path identification expression as an identification rule. You can enter a regular expression as an identification rule.
Parameter Required? Description Rule Name Yes The name of the rule. The name can contain up to 64 characters. Rule Status Yes The status of the rule. Valid values: Enabled and Disabled. Identification object - Database name No Enter a regular expression for matching a database name.
For example:*represents a database of any name.Identification object - Table name No Enter a regular expression to match a table name.
For example,e[a-z]?.*represents any table with a lowercase English name that starts with the letter e.Identification object - Column name No Enter a regular expression to match a column name. Identification object - Column comments No Enter a regular expression for matching column comments. Desensitization Algorithm Yes The default desensitization algorithm for masking sensitive columns. Description No The description of the identification rule. Script: Use the database name, table name, column name, column comments, or data type as the identification object. You can enter a Groovy script as the identification rule.
Notice
The output of the script must be a Boolean value, either
TrueorFalse.Parameter Required? Description Rule Name Yes The name of the rule. The name can contain up to 64 characters. Rule Status Yes The status of the rule. Valid values: Enabled and Disabled. Groovy Script Yes A script that determines whether a column is a sensitive column. The script must comply with the Groovy syntax standard. Desensitization Algorithm No The default desensitization algorithm for masking sensitive columns. Description No The description of the identification rule. ODC has a built-in object named
columnfor users to reference in Groovy scripts. The following table describes the attributes of the object.Attribute Type Description schema String The name of the database to which the column belongs. table String The name of the table to which the column belongs. name String The name of the column. comment String The comment of the column. type String The data type of the column.
Here are some sample scripts of general identification rules:
Addresses
if (("varchar".equals(column.type) || "char".equals(column.type))) { if (column.name.indexOf("address") >= 0) { return true; } if (column.comment != null && (column.comment.toLowerCase().indexOf("address") >= 0 || column.comment.indexOf("address") >= 0 || column.comment.indexOf("home address") >= 0 || column.comment.indexOf("location") >= 0)) { return true; } } return false;Mobile phone numbers
if (column.name.length() == 11 && ("varchar".equals(column.type) || "char".equals(column.type))) { if (column.name.indexOf("phone") >= 0 || column.name.indexOf("mobile") >= 0) { return true; } if (column.comment != null && (column.comment.toLowerCase().indexOf("phone") >= 0 | | column.comment.indexOf("phone") >= 0 || column.comment.indexOf("mobile") >= 0 || column.comment.indexOf("mobile") >= 0)) { return true; } } return false;ID card numbers
if (column.name.length() >= 15 && ("varchar".equals(column.type) || "char".equals(column.type))) { if (column.name.indexOf("id_number") >= 0 || column.name.indexOf("identity_card") >= 0) { return true; } if (column.comment != null && (column.comment.toLowerCase().indexOf("identity card") >= 0 || column.comment.indexOf("ID card") >= 0)) { return true; } } return false;
Add an identification rule
Assume that you want to add an identification rule for the mobile_phone column in the student table in the odc_test database as the administrator.
In the project collaboration window, choose Project > Sensitive Data > Identification Rules > Create Identification Rule.
In the Create Identification Rule dialog box, specify the rule name, rule status, and identification method.
For example,
odc_test*.student.*a,*.*.mobile_phonematches themobile_phonecolumn in thestudenttable in theodc_testdatabase.In the Identification Rules list, you can view and enable the created identification rule.
Manage identification rules
As shown in the preceding figure, you can click Edit or Delete to modify or delete an identification rule.
View desensitization algorithms
On the project collaboration page, choose Security Specifications > Desensitization Algorithms to view the desensitization algorithms supported by ODC.
The following table describes the desensitization algorithms supported by ODC.
| Algorithm name | Test data | Result preview |
|---|---|---|
| Mask all (system default) | test value | ***** |
| Personal name (Chinese) | 个人姓名 | **名 |
| Personal name (letters) | Personal name | P** |
| Nickname | Nickname | N***e |
| odc@oceanbase.com | o***@oceanbase.com | |
| Address | Hangzhou, Zhejiang Province, China | Hangzhou, Z*** |
| Mobile phone number | 13500000000 | 135******00 |
| Landline number | 010-12345678 | **********78 |
| ID card number | 123456789 | 1*******9 |
| Bank account | 1234 5678 5678 1234 | ***************1234 |
| License plate number | Zhejiang AB1234 | Zhejiang A**234 |
| Unique device ID | AB123456789CD | ****89CD |
| IP address | 10.123.456.789 | 10...* |
| MAC address | ab:cd:ef:gh:hi:jk | ab:*:*:*:*:* |
| MD5 | default | c21f969b5f03d33d43e04f8f136e7682 |
| SHA256 | default | 37a8eec1ce19687d132fe29051dca629d164e2c4958ba141d5f4133a33f0688f |
| SHA512 | default | 1625cdb75d25d9f699fd2779f44095b6e320767f606f095eb7edab5581e9e3441adbb0d628832f7dc4574a77a382973ce22911b7e4df2a9d2c693826bbd125bc |
| SM3 | default | 40c357923156504f734717d8b4f5623e75209e9572701f4b51ef2a03d9ced863 |
| Rounding | 123.456 | 123 |
| Empty | default | - |
| Default rule | abcd1234 | abc**234 |
Scenarios
The added sensitive columns take effect in data export, database change, and SQL window execution scenarios.
Scenario 1: Data export
Assume that you want to export the student table from the dc_test database and check the desensitization result.
In the left-side navigation pane of the SQL development window, submit a ticket to export the
studenttable. In the export task list, click View.In the lower-right corner of the export task details page, click Download.
View the
studenttable on your local disk.
Scenario 2: Database changes
Assume that you want to insert data into the student table and check the desensitization result. Data desensitization is automatically enabled in this case.
In the left-side navigation pane of the SQL development window, submit a ticket to create a database change task that inserts data into the
studenttable.In the left-side navigation pane of the SQL development window, check the
studenttable in the odc_test database under
. The data is desensitized.
Scenario 3: SQL statement execution in the SQL window
Assume that you want to insert data into the student table and check the desensitization result. Data desensitization is automatically enabled in this case.
In the SQL window, edit an SQL statement to insert data into the
studenttable.On the
Resultstab, view the data in thestudenttable. The data is desensitized.