String Functions
This page provides a comprehensive overview of String functions in TiDB Cloud Lake, organized by functionality for easy reference.
String Concatenation and Manipulation
| Function | Description | Example |
|---|
| CONCAT | Concatenates strings | CONCAT('data', 'lake') → 'datalake' |
| CONCAT_WS | Concatenates strings with a separator | CONCAT_WS('-', 'data', 'lake') → 'data-lake' |
| INSERT | Inserts a string at a specified position | INSERT('datalake', 5, 0, 'cloud') → 'datacloudlake' |
| REPLACE | Replaces occurrences of a substring | REPLACE('datalake', 'lake', 'cloud') → 'datacloud' |
| TRANSLATE | Replaces characters with their replacements | TRANSLATE('datalake', 'de', 'DE') → 'DatalakE' |
| Function | Description | Example |
|---|
| LEFT | Returns leftmost characters | LEFT('datalake', 4) → 'data' |
| RIGHT | Returns rightmost characters | RIGHT('datalake', 4) → 'lake' |
| SUBSTR / SUBSTRING | Extracts a substring | SUBSTR('datalake', 5, 4) → 'lake' |
| MID | Extracts a substring (alias for SUBSTRING) | MID('datalake', 5, 4) → 'lake' |
| SPLIT | Splits a string into an array | SPLIT('data,lake', ',') → ['data', 'lake'] |
| SPLIT_PART | Returns a specific part after splitting | SPLIT_PART('data,lake', ',', 2) → 'lake' |
| REGEXP_SPLIT_TO_ARRAY | Split a string into an array of segments using the specified pattern | regexp_split_to_array('apple,banana,orange', ','); → '['apple','banana','orange']' |
| REGEXP_SPLIT_TO_TABLE | Split a string into a table of segments using the specified pattern | regexp_split_to_table('data,lake', ',', 2) |
| Function | Description | Example |
|---|
| LPAD | Left-pads a string to a length | LPAD('lake', 8, 'data') → 'datalake' |
| RPAD | Right-pads a string to a length | RPAD('data', 8, 'lake') → 'datalake' |
| REPEAT | Repeats a string n times | REPEAT('data', 2) → 'datadata' |
| SPACE | Returns a string of spaces | SPACE(4) → ' ' |
| REVERSE | Reverses a string | REVERSE('datalake') → 'ekalatad' |
String Trimming
| Function | Description | Example |
|---|
| TRIM | Removes leading and trailing spaces | TRIM(' datalake ') → 'datalake' |
| TRIM_BOTH | Removes specified chars from both ends | TRIM_BOTH('xxdatalakexx', 'x') → 'datalake' |
| TRIM_LEADING | Removes specified chars from start | TRIM_LEADING('xxdatalake', 'x') → 'datalake' |
| TRIM_TRAILING | Removes specified chars from end | TRIM_TRAILING('datalakexx', 'x') → 'datalake' |
| LTRIM | Removes leading spaces | LTRIM(' datalake') → 'datalake' |
| RTRIM | Removes trailing spaces | RTRIM('datalake ') → 'datalake' |
| Function | Description | Example |
|---|
| LENGTH | Returns string length in characters | LENGTH('datalake') → 8 |
| CHAR_LENGTH / CHARACTER_LENGTH | Returns string length in characters | CHAR_LENGTH('datalake') → 8 |
| BIT_LENGTH | Returns string length in bits | BIT_LENGTH('datalake') → 64 |
| OCTET_LENGTH | Returns string length in bytes | OCTET_LENGTH('datalake') → 8 |
| INSTR | Returns position of first occurrence | INSTR('datalake', 'lake') → 5 |
| LOCATE | Returns position of first occurrence | LOCATE('lake', 'datalake') → 5 |
| POSITION | Returns position of first occurrence | POSITION('lake' IN 'datalake') → 5 |
| STRCMP | Compares two strings | STRCMP('datalake', 'datalake') → 0 |
| JARO_WINKLER | Returns similarity between strings | JARO_WINKLER('datalake', 'datalake') → 1.0 |
Case Conversion
| Function | Description | Example |
|---|
| LOWER / LCASE | Converts to lowercase | LOWER('DataLake') → 'datalake' |
| UPPER / UCASE | Converts to uppercase | UPPER('datalake') → 'DATALAKE' |
Pattern Matching
| Function | Description | Example |
|---|
| LIKE | Pattern matching with wildcards | 'datalake' LIKE 'data%' → true |
| NOT_LIKE | Negated LIKE | 'datalake' NOT LIKE 'cloud%' → true |
| REGEXP / RLIKE | Pattern matching with regex | 'datalake' REGEXP '^data' → true |
| NOT_REGEXP / NOT_RLIKE | Negated regex matching | 'datalake' NOT REGEXP '^cloud' → true |
| REGEXP_LIKE | Returns boolean for regex match | REGEXP_LIKE('datalake', '^data') → true |
| REGEXP_INSTR | Returns position of regex match | REGEXP_INSTR('datalake', 'lake') → 5 |
| REGEXP_SUBSTR | Returns substring matching regex | REGEXP_SUBSTR('datalake', 'lake') → 'lake' |
| REGEXP_REPLACE | Replaces regex matches | REGEXP_REPLACE('datalake', 'lake', 'cloud') → 'datacloud' |
| GLOB | Unix-style pattern matching | 'datalake' GLOB 'data*' → true |
Encoding and Decoding
| Function | Description | Example |
|---|
| ASCII | Returns ASCII value of first character | ASCII('D') → 68 |
| ORD | Returns Unicode code point of first character | ORD('D') → 68 |
| CHAR / CHR | Returns string of characters for given Unicode code points | CHAR(68,97,116,97) → 'Data' |
| BIN | Returns binary representation | BIN(5) → '101' |
| OCT | Returns octal representation | OCT(8) → '10' |
| HEX | Returns hexadecimal representation | HEX('ABC') → '414243' |
| UNHEX | Converts hex to binary | UNHEX('414243') → 'ABC' |
| TO_BASE64 | Encodes to base64 | TO_BASE64('datalake') → 'ZGF0YWxha2U=' |
| FROM_BASE64 | Decodes from base64 | FROM_BASE64('ZGF0YWxha2U=') → 'datalake' |
Miscellaneous
| Function | Description | Example |
|---|
| QUOTE | Escapes string for SQL | QUOTE('datalake') → '"datalake"' |
| SOUNDEX | Returns soundex code | SOUNDEX('datalake') → 'D42' |
| SOUNDSLIKE | Compares soundex values | SOUNDSLIKE('datalake', 'datalake') → true |