String Functions
TiDB supports most of the string functions available in MySQL 5.7, some of the string functions available in MySQL 8.0, and some of the functions available in Oracle 21.
Supported functions
Name | Description |
---|---|
ASCII() | Return numeric value of left-most character |
BIN() | Return a string containing binary representation of a number |
BIT_LENGTH() | Return length of argument in bits |
CHAR() | Return the character for each integer passed |
CHAR_LENGTH() | Return number of characters in argument |
CHARACTER_LENGTH() | Synonym for CHAR_LENGTH() |
CONCAT() | Return concatenated string |
CONCAT_WS() | Return concatenate with separator |
ELT() | Return string at index number |
EXPORT_SET() | Return a string such that for every bit set in the value bits, you get an on string and for every unset bit, you get an off string |
FIELD() | Return the index (position) of the first argument in the subsequent arguments |
FIND_IN_SET() | Return the index position of the first argument within the second argument |
FORMAT() | Return a number formatted to specified number of decimal places |
FROM_BASE64() | Decode to a base-64 string and return result |
HEX() | Return a hexadecimal representation of a decimal or string value |
INSERT() | Insert a substring at the specified position up to the specified number of characters |
INSTR() | Return the index of the first occurrence of substring |
LCASE() | Synonym for LOWER() |
LEFT() | Return the leftmost number of characters as specified |
LENGTH() | Return the length of a string in bytes |
LIKE | Simple pattern matching |
LOCATE() | Return the position of the first occurrence of substring |
LOWER() | Return the argument in lowercase |
LPAD() | Return the string argument, left-padded with the specified string |
LTRIM() | Remove leading spaces |
MAKE_SET() | Return a set of comma-separated strings that have the corresponding bit in bits set |
MID() | Return a substring starting from the specified position |
NOT LIKE | Negation of simple pattern matching |
NOT REGEXP | Negation of REGEXP |
OCT() | Return a string containing octal representation of a number |
OCTET_LENGTH() | Synonym for LENGTH() |
ORD() | Return character code for leftmost character of the argument |
POSITION() | Synonym for LOCATE() |
QUOTE() | Escape the argument for use in an SQL statement |
REGEXP | Pattern matching using regular expressions |
REGEXP_INSTR() | Return the starting index of the substring that matches the regular expression (Partly compatible with MySQL. For more details, see Regular expression compatibility with MySQL) |
REGEXP_LIKE() | Whether the string matches the regular expression (Partly compatible with MySQL. For more details, see Regular expression compatibility with MySQL) |
REGEXP_REPLACE() | Replace substrings that match the regular expression (Partly compatible with MySQL. For more details, see Regular expression compatibility with MySQL) |
REGEXP_SUBSTR() | Return the substring that matches the regular expression (Partly compatible with MySQL. For more details, see Regular expression compatibility with MySQL) |
REPEAT() | Repeat a string the specified number of times |
REPLACE() | Replace occurrences of a specified string |
REVERSE() | Reverse the characters in a string |
RIGHT() | Return the specified rightmost number of characters |
RLIKE | Synonym for REGEXP |
RPAD() | Append string the specified number of times |
RTRIM() | Remove trailing spaces |
SPACE() | Return a string of the specified number of spaces |
STRCMP() | Compare two strings |
SUBSTR() | Return the substring as specified |
SUBSTRING() | Return the substring as specified |
SUBSTRING_INDEX() | Return a substring from a string before the specified number of occurrences of the delimiter |
TO_BASE64() | Return the argument converted to a base-64 string |
TRANSLATE() | Replace all occurrences of characters by other characters in a string. It does not treat empty strings as NULL as Oracle does. |
TRIM() | Remove leading and trailing spaces |
UCASE() | Synonym for UPPER() |
UNHEX() | Return a string containing hex representation of a number |
UPPER() | Convert to uppercase |
WEIGHT_STRING() | Return the weight string for the input string |
Unsupported functions
LOAD_FILE()
MATCH()
SOUNDEX()
Regular expression compatibility with MySQL
The following sections describe the regular expression compatibility with MySQL, including REGEXP_INSTR()
, REGEXP_LIKE()
, REGEXP_REPLACE()
, and REGEXP_SUBSTR()
.
Syntax compatibility
MySQL implements regular expression using International Components for Unicode (ICU), and TiDB uses RE2. To learn the syntax differences between the two libraries, you can refer to the ICU documentation and RE2 Syntax.
match_type
compatibility
The value options of match_type
between TiDB and MySQL are:
Value options in TiDB are
"c"
,"i"
,"m"
, and"s"
, and value options in MySQL are"c"
,"i"
,"m"
,"n"
, and"u"
.The
"s"
in TiDB corresponds to"n"
in MySQL. When"s"
is set in TiDB, the.
character also matches line terminators (\n
).For example, the
SELECT REGEXP_LIKE(a, b, "n") FROM t1
in MySQL is the same as theSELECT REGEXP_LIKE(a, b, "s") FROM t1
in TiDB.TiDB does not support
"u"
, which means Unix-only line endings in MySQL.
Data type compatibility
The difference between TiDB and MySQL support for the binary string type:
- MySQL does not support binary strings in regular expression functions since 8.0.22. For more details, refer to MySQL documentation. But in practice, regular functions can work in MySQL when all parameters or return types are binary strings. Otherwise, an error will be reported.
- Currently, TiDB prohibits using binary strings and an error will be reported under any circumstances.
Other compatibility
The behavior of replacing empty strings in TiDB is different from MySQL. Taking
REGEXP_REPLACE("", "^$", "123")
as an example:- MySQL does not replace the empty string and returns
""
as the result. - TiDB replaces the empty string and returns
"123"
as the result.
- MySQL does not replace the empty string and returns
The keyword used for capturing groups in TiDB is different from MySQL. MySQL uses
$
as the keyword, while TiDB uses\\
as the keyword. In addition, TiDB only supports capturing groups numbered from0
to9
.For example, the following SQL statement returns
ab
in TiDB:SELECT REGEXP_REPLACE('abcd','(.*)(.{2})$','\\1') AS s;