SQL Identifiers
SQL identifiers are names used for different elements within TiDB Cloud Lake, such as tables, views, and databases.
Unquoted & Double-quoted Identifiers
Unquoted identifiers begin with a letter (A-Z, a-z) or underscore (“_”) and may consist of letters, underscores, numbers (0-9), or dollar signs (“$”).
mydatalake
MyDatalake1
My$datalake
_my_datalake
Double-quoted identifiers can include a wide range of characters, such as numbers (0-9), special characters (like period (.), single quote ('), exclamation mark (!), at symbol (@), number sign (#), dollar sign ($), percent sign (%), caret (^), and ampersand (&)), extended ASCII and non-ASCII characters, as well as blank spaces.
"MyDatalake"
"my.datalake"
"my datalake"
"My 'Datalake'"
"1_datalake"
"$Datalake"
Note that using double backticks (``) or double quotes (") is equivalent:
`MyDatalake`
`my.datalake`
`my datalake`
`My 'Datalake'`
`1_datalake`
`$Datalake`
Identifier Casing Rules
TiDB Cloud Lake stores unquoted identifiers by default in lowercase and double-quoted identifiers as they are entered. In other words, TiDB Cloud Lake handles object names, such as databases, tables, and columns, as case-insensitive. If you want TiDB Cloud Lake to handle them as case-sensitive, double-quote them.
This example demonstrates how TiDB Cloud Lake treats the casing of identifiers when creating and listing databases:
-- Create a database named "datalake"
CREATE DATABASE datalake;
-- Attempt to create a database named "Datalake"
CREATE DATABASE Datalake;
>> SQL Error [1105] [HY000]: DatabaseAlreadyExists. Code: 2301, Text = Database 'datalake' already exists.
-- Create a database named "Datalake"
CREATE DATABASE "Datalake";
-- List all databases
SHOW DATABASES;
databases_in_default|
--------------------+
Datalake |
datalake |
default |
information_schema |
system |
This example demonstrates how TiDB Cloud Lake handles identifier casing for table and column names, highlighting its case-sensitivity by default and the use of double quotes to differentiate between identifiers with varying casing:
-- Create a table named "datalake"
CREATE TABLE datalake (a INT);
DESC datalake;
Field|Type|Null|Default|Extra|
-----+----+----+-------+-----+
a |INT |YES |NULL | |
-- Attempt to create a table named "Datalake"
CREATE TABLE Datalake (a INT);
>> SQL Error [1105] [HY000]: TableAlreadyExists. Code: 2302, Text = Table 'datalake' already exists.
-- Attempt to create a table with one column named "a" and the other one named "A"
CREATE TABLE "Datalake" (a INT, A INT);
>> SQL Error [1105] [HY000]: BadArguments. Code: 1006, Text = Duplicated column name: a.
-- Double quote the column names
CREATE TABLE "Datalake" ("a" INT, "A" INT);
DESC "Datalake";
Field|Type|Null|Default|Extra|
-----+----+----+-------+-----+
a |INT |YES |NULL | |
A |INT |YES |NULL | |
String Identifiers
In TiDB Cloud Lake, when managing string items like text and dates, it is essential to enclose them within single quotes (') as a standard practice.
INSERT INTO weather VALUES ('San Francisco', 46, 50, 0.25, '1994-11-27');
SELECT 'Datalake';
'datalake'|
----------+
Datalake |
SELECT "Datalake";
>> SQL Error [1105] [HY000]: SemanticError. Code: 1065, Text = error:
--> SQL:1:73
|
1 | /* ApplicationName=DBeaver 23.2.0 - SQLEditor <Script-12.sql> */ SELECT "Datalake"
| ^^^^^^^^^^ column Datalake doesn't exist, do you mean 'Datalake'?
By default, TiDB Cloud Lake SQL dialect is PostgreSQL:
SHOW SETTINGS LIKE '%sql_dialect%';
name |value |default |level |description |type |
-----------+----------+----------+-------+---------------------------------------------------------------------------------+------+
sql_dialect|PostgreSQL|PostgreSQL|SESSION|Sets the SQL dialect. Available values include "PostgreSQL", "MySQL", and "Hive".|String|
You can change it to MySQL to enable double quotes ("):
SET sql_dialect='MySQL';
SELECT "demo";
+--------+
| 'demo' |
+--------+
| demo |
+--------+