Filter Certain Row Changes Using SQL Expressions
Overview
In the process of data migration, DM provides the Binlog Event Filter feature to filter certain types of binlog events. For example, for archiving or auditing purposes, DELETE
event might be filtered when data is migrated to the downstream. However, Binlog Event Filter cannot judge with a greater granularity whether the DELETE
event of a certain row should be filtered.
To solve the above issue, DM supports filtering certain row changes using SQL expressions. The binlog in the ROW
format supported by DM has the values of all columns in binlog events. You can configure SQL expressions according to these values. If the SQL expressions evaluate a row change as TRUE
, DM will not migrate the row change downstream.
Configuration example
Similar to Binlog Event Filter, you also need to configure the expression-filter feature in the configuration file of the data migration task, as shown below. For complete configuration and its descriptions, refer to DM Advanced Task Configuration File:
name: test
task-mode: all
target-database:
host: "127.0.0.1"
port: 4000
user: "root"
password: ""
mysql-instances:
- source-id: "mysql-replica-01"
expression-filters: ["even_c"]
expression-filter:
even_c:
schema: "expr_filter"
table: "tbl"
insert-value-expr: "c % 2 = 0"
The above example configures even_c
rule, and allows the data source whose ID is mysql-replica-01
to refer this rule. The meaning of even_c
is:
For the tbl
table in the expr_filter
shema, when the value of the inserted c
is even (c % 2 = 0
), the inserted statement will not be migrated downstream.
The usage result of this rule is shown below.
Insert the following data in the upstream data source:
INSERT INTO tbl(id, c) VALUES (1, 1), (2, 2), (3, 3), (4, 4);
Then query the tbl
table downstream and you can find that only rows with an odd value of c
are migrated downstream:
MySQL [test]> select * from tbl;
+------+------+
| id | c |
+------+------+
| 1 | 1 |
| 3 | 3 |
+------+------+
2 rows in set (0.001 sec)
Configuration parameters and rule descriptions
schema
: The name of the upstream database to be matched. Wildcard match or regular match is not supported.table
: The name of the upstream table to be matched. Wildcard match or regular match is not supported.insert-value-expr
: Specifies an expression which takes effect on the value of binlog event (WRITE_ROWS_EVENT) of INSERT type. Do not use it withupdate-old-value-expr
,update-new-value-expr
, ordelete-value-expr
in the same configuration item.update-old-value-expr
:Specifies an expression which takes effect on the old value of binlog event (UPDATE_ROWS_EVENT) of UPDATE type. Do not use it withinsert-value-expr
ordelete-value-expr
in the same configuration item.update-new-value-expr
: Specifies an expression which takes effect on the new value of binlog event (UPDATE_ROWS_EVENT) of UPDATE type. Do not use it withinsert-value-expr
ordelete-value-expr
in the same configuration item.delete-value-expr
:Specifies an expression which takes effect on the value of binlog event (DELETE_ROWS_EVENT) of DELETE type. Do not use it withinsert-value-expr
,update-old-value-expr
, orupdate-new-value-expr
in the same configuration item.
SQL expressions can involve one or more columns. You can also use the SQL functions TiDB supports, such as c % 2 = 0
, a*a + b*b = c*c
, and ts > NOW()
.
The timezone of TIMESTAMP is UTC by default. You can use c_timestamp = '2021-01-01 12:34:56.5678+08:00'
to specify the timezone explicitly.
You can define multiple filter rules under the configuration item expression-filter
. By refering the rules you need in the configuration item of expression-filters
in the upstream data source, the rules can take effect. When multiple rules take effect, matching any of the rules causes a row change to be filtered.