Desensitize data

2025-11-19 07:09:12  Updated

Background information

Data desensitization deforms sensitive privacy information, such as names, ID card numbers, mobile numbers, landline numbers, bank accounts, and email addresses, by using desensitization algorithms to protect sensitive privacy data.

Concepts

  • Data desensitization: a feature that uses specific algorithms and techniques to process, obfuscate, or replace sensitive data during data processing and storage so that the data cannot be identified or restored, thereby protecting data security and preventing data breaches.

  • Dynamic desensitization: a process of desensitizing sensitive data in real time when a user queries the database, without modifying the source data in the database. Generally, dynamic desensitization is used in a production environment. It avoids data breaches while ensuring the integrity and accuracy of the original data. However, dynamic desensitization is slow, which can compromise the query efficiency of the database.

  • Static desensitization: a process of preprocessing sensitive data and storing the processed data in storage media, such as a database. Static desensitization is usually used in testing, development, and demonstration environments. It protects sensitive data from being viewed by unauthorized personnel and avoids the legal liability of data breaches. Static desensitization is quick, which can improve the query efficiency. However, after static desensitization, the original data cannot be restored, which undermines data accuracy.

  • Desensitization algorithm: an algorithm used to desensitize sensitive data. A desensitization algorithm can effectively protect the security of sensitive data and avoid data breaches while retaining the data formats and structures to facilitate queries and usage.

  • Identification rule: a rule used to automatically identify sensitive data for data desensitization. When you scan to add sensitive columns, sensitive columns are automatically identified based on specified rules.

  • Sensitive column: a column that contains sensitive data in a database table.

Execution process

2.2

  1. Log in to the ODC console as the project administrator and choose Security Specifications > Masking Algorithm to view the built-in desensitization algorithms and test the desensitization effect.

  2. Log in to the ODC console as the project administrator and choose Projects > Sensitive Columns. You can click Add Sensitive Column and select Add Manually or Scan to Add. To select Scan to Add, make sure that identification rules have been created.

  3. When you log in as a regular user and perform queries in the SQL window, export a result set, export a ticket, or submit a database change ticket, the output data in the sensitive columns is desensitized.

Prerequisites

  • The project administrator or DBA can manage sensitive columns and identification rules.

  • All users can view and test the desensitization effect, but regular users are not allowed to create, edit, or delete desensitization algorithms.

Considerations

  • Data desensitization is not supported in command-line windows.

  • Data desensitization is not supported during PL execution.

  • When you configure an identification rule script, Groovy supports only the Objects and String classes in Java.

  • When you configure an identification rule script, Groovy closures and inherent closure functions are not supported.

  • If sensitive data is involved when you export data from a MySQL data source, the mask all algorithm is used for desensitization.

Sensitive column management

Add sensitive columns

Note

In ODC V4.2.2 and later, you can add sensitive columns for a MySQL data source. For the limitations on features of MySQL data sources, see Limitations.

Example: Desensitize the email and mobile_phone columns of the student table in the odc_test database.

Parameter Example value
Data source mysql_4.2.0
Source database name odc_test
Table name student
  1. In the project collaboration window, choose Projects > Sensitive Columns. You can click Add Sensitive Column and select Add Manually or Scan to Add.

    3

  2. After you manually add or scan to add sensitive columns, click Submit.

    • Method 1: Manually add sensitive columns

      4

    • Method 2: Scan to add sensitive columns

      Note

      Before you scan to add sensitive columns, make sure that identification rules have been created. For more information, see the Identification rule management section in this topic.

      5

  3. In the sensitive column list, you can view and enable added sensitive columns.

    6

Edit a sensitive column

In the sensitive column list shown in the preceding figure, click Edit in the Actions column of a sensitive column to change the desensitization algorithm for the sensitive column.

Delete a sensitive column

In the sensitive column list, click Delete in the Actions column of a sensitive column to delete it.

Identification rule management

Identification rules are an extension of the sensitive data management capability. In addition to manually adding sensitive columns, you can also specify identification rules to implement automatic scan of sensitive columns. An identification rule defines the matching conditions. ODC will identify the columns that meet the specified conditions as sensitive columns. ODC supports three identification methods for identification rules: path, regular expression, and script.

  • Path: An identification rule based on a path expression identifies a sensitive column based on its database name, table name, or column name. The database name, table name, and column name are separated with periods (.). Matching conditions are separated with commas (,). Asterisks (*) are used as wildcard characters.

    Parameter Required? Description
    Rule Name Yes The name of the identification rule, which cannot exceed 64 characters in length.
    Rule Status Yes The status of the identification rule. Valid values: Enable and Disable.
    Matching Rule Yes Conditions for matching sensitive columns. Columns that meet the conditions specified here are identified as sensitive columns.
    For example, the identification rule \*\.\*\.mobile_phone specifies to match columns named mobile_phone in any tables and databases.
    Exclusion Rule No Conditions for excluding data columns.

    Notice

    The system determines whether a data column is a sensitive column based on the exclusion conditions first and then the matching conditions.

    Masking Algorithm Yes The default desensitization algorithm for identified sensitive columns.
    Rule Description No The description of the identification rule.
  • Regular expression: An identification rule based on a regular expression identifies a sensitive column based on its database name, table name, column name, or column remarks.

    Parameter Required? Description
    Rule Name Yes The name of the identification rule, which cannot exceed 64 characters in length.
    Rule Status Yes The status of the identification rule. Valid values: Enable and Disable.
    Identification Object-Database Name No The regular expression for matching database names.
    For example, * indicates databases with any names.
    Identification Object-Table Name No The regular expression for matching table names.
    For example, e[a-z]?.* indicates tables whose names are in lowercase and start with the letter e.
    Identification Object-Column Name No The regular expression for matching column names.
    Identification Object-Column Remarks No The regular expression for matching column remarks.
    Masking Algorithm Yes The default desensitization algorithm for identified sensitive columns.
    Rule Description No The description of the identification rule.
  • Script: An identification rule based on a Groovy script identifies a sensitive column based on its database name, table name, column name, column remarks, or data type.

    Notice

    The output of the script must be a Boolean value, namely True or False.

    Parameter Required? Description
    Rule Name Yes The name of the identification rule, which cannot exceed 64 characters in length.
    Rule Status Yes The status of the identification rule. Valid values: Enable and Disable.
    Groovy Script Yes The script that determines whether a data column is a sensitive column. The script is written based on Groovy syntax specifications.
    Masking Algorithm No The default desensitization algorithm for identified sensitive columns.
    Rule Description No The description of the identification rule.

    ODC provides built-in column objects for you to reference in the Groovy script. The following table describes the attributes in a column object.

    Attribute Type Description
    schema String The name of the database to which the column belongs.
    table String The name of the table to which the column belongs.
    name String The name of the column.
    comment String The comment on the column.
    type String The data type of the column.

Here are several sample identification rules that use a script as the identification method:

  • Address:
if (("varchar".equals(column.type) || "char".equals(column.type))) {
    if (column.name.indexOf("address") >= 0) {
        return true;
    }
    if (column.comment != null &&
            (column.comment.toLowerCase().indexOf("address") >= 0
                    || column.comment.indexOf("Address") >= 0
                    || column.comment.indexOf("Residence address") >= 0
                    || column.comment.indexOf("Location") >= 0)) {
        return true;
    }
}
return false;
  • Mobile number:
if (column.name.length() == 11 && ("varchar".equals(column.type) || "char".equals(column.type))) {
    if (column.name.indexOf("phone") >= 0 || column.name.indexOf("mobile") >= 0) {
        return true;
    }
    if (column.comment != null &&
            (column.comment.toLowerCase().indexOf("phone") >= 0
                    || column.comment.indexOf("Phone number") >= 0
                    || column.comment.indexOf("mobile") >= 0
                    || column.comment.indexOf("Mobile number") >= 0)) {
        return true;
    }
}
return false;
  • ID card number:
if (column.name.length() >= 15 && ("varchar".equals(column.type) || "char".equals(column.type))) {
    if (column.name.indexOf("id_number") >= 0 || column.name.indexOf("identity_card") >= 0) {
        return true;
    }
    if (column.comment != null &&
            (column.comment.toLowerCase().indexOf("identity card") >= 0
                    || column.comment.indexOf("ID card number") >= 0)) {
        return true;
    }
}
return false;

Add an identification rule

Example: Add an identification rule for the mobile_phone column of the student table in the odc_test database as the administrator.

  1. In the project collaboration window, choose Projects > Sensitive Columns. Then click Add Sensitive Column and select Scan to Add.

    11

  2. In the Scan to Add Sensitive Columns dialog box, click Identification Rule and select Manage Identification Rule from the drop-down list.

    12

  3. In the Manage Identification Rule dialog box, click Create Identification Rule.

    13

  4. In the Create Identification Rule pane, specify the name, status, identification method, and desensitization algorithm of the rule and click Create.

    14

    For example, the path-based identification rule odc_test*.student.*a,*.*.mobile_phone specifies to match the mobile_phone column of the student table in the odc_test database.

  5. In the identification rule list, you can view and enable the added identification rule.

    15

Manage identification rules

On the page shown in the preceding figure, click View, Edit, or Delete in the Actions column of an identification rule to view, modify, or delete the identification rule.

View desensitization algorithms

In the project collaboration window, choose Security Specifications > Masking Algorithm and view the desensitization algorithms supported by ODC.

The following table lists the desensitization algorithms supported by ODC.

Algorithm Test data Preview result
Mask all (system default) test value *****
Personal name (Chinese character) 个人姓名 **名
Personal name (alphabet) Personal Name P**
Nickname Nickname N***e
Email odc@test.com o**@test.com
Address Hangzhou, Zhejiang Province, China Hangzhou, Z***
Phone number 13500000000 135******00
Fixed line phone number 010-12345678 **********78
Certificate number 123456789 1*******9
Bank card number 1234 5678 5678 1234 ***************1234
License plate number GZ1234 G**234
Device unique identification number AB123456789CD ****89CD
IP address 10.123.456.789 10...*
MAC address ab:cd:ef:gh:hi:jk ab:*:*:*:*:*
MD5 default c21f969b5f03d33d43e04f8f136e7682
SHA256 default 37a8eec1ce19687d132fe29051dca629d164e2c4958ba141d5f4133a33f0688f
SHA512 default 1625cdb75d25d9f699fd2779f44095b6e320767f606f095eb7edab5581e9e3441adbb0d628832f7dc4574a77a382973ce22911b7e4df2a9d2c693826bbd125bc
SM3 default 40c357923156504f734717d8b4f5623e75209e9572701f4b51ef2a03d9ced863
Rounding 123.456 123
Blanking default -
Default abcd1234 abc**234

Scenarios

Added sensitive columns are desensitized in the SQL window and during data export and data changes.

Scenario 1: Data desensitization during data export

Example: When you export data from the student table in the odc_test database, data is automatically desensitized.

  1. After you export the student table by submitting a ticket in the ODC console, click View in the Actions column of the export task in the export task list.

  2. In the Task Details panel, click Download in the lower-right corner.

  3. View the downloaded student table in your local disk.

    19

Scenario 2: Data desensitization during data changes

Example: When you insert data into the student table, data is automatically desensitized.

  1. In the ODC console, submit a database change ticket to insert data into the student table.

  2. In the left-side navigation pane of the SQL window, locate the database odc_test database and view the desensitized data in the student table.

    20

Scenario 3: Data desensitization in an SQL window

Example: When you insert data into the student table, data is automatically desensitized.

  1. In the SQL window, write an SQL statement to insert data into the student table.

  2. On the result tab, you can view the desensitized data in the student table.

References

Contact Us