📣
TiDB Cloud Essential is now in public preview. Try it out →

OpenAI Embeddings



This document describes how to use OpenAI embedding models with Auto Embedding in TiDB Cloud to perform semantic searches with text queries.

Available models

All OpenAI models are available for use with the openai/ prefix if you bring your own OpenAI API key (BYOK). For example:

text-embedding-3-small

  • Name: openai/text-embedding-3-small
  • Dimensions: 512-1536 (default: 1536)
  • Distance metric: Cosine, L2
  • Price: Charged by OpenAI
  • Hosted by TiDB Cloud: ❌
  • Bring Your Own Key: ✅

text-embedding-3-large

  • Name: openai/text-embedding-3-large
  • Dimensions: 256-3072 (default: 3072)
  • Distance metric: Cosine, L2
  • Price: Charged by OpenAI
  • Hosted by TiDB Cloud: ❌
  • Bring Your Own Key: ✅

For a full list of available models, see OpenAI Documentation.

Usage example

This example shows how to create a vector table, insert documents, and run similarity search using OpenAI embedding models.

You can integrate the OpenAI Embeddings API with TiDB using the AI SDK or native SQL functions for automatic embedding generation.

Step 1: Connect to the database

    from pytidb import TiDBClient tidb_client = TiDBClient.connect( host="{gateway-region}.prod.aws.tidbcloud.com", port=4000, username="{prefix}.root", password="{password}", database="{database}", ensure_db=True, )
    mysql -h {gateway-region}.prod.aws.tidbcloud.com \ -P 4000 \ -u {prefix}.root \ -p{password} \ -D {database}

    Step 2: Configure the API key

    Create an API key in the OpenAI API Platform and bring your own key (BYOK) to use the embedding service.

      Configure the API key for the OpenAI embedding provider using the TiDB Client:

      tidb_client.configure_embedding_provider( provider="openai", api_key="{your-openai-api-key}", )

      Set the API key for the OpenAI embedding provider using SQL:

      SET @@GLOBAL.TIDB_EXP_EMBED_OPENAI_API_KEY = "{your-openai-api-key}";

      Step 3: Create a vector table

      Create a table with a vector field that uses the openai/text-embedding-3-small model to generate 1536-dimensional vectors:

        from pytidb.schema import TableModel, Field from pytidb.embeddings import EmbeddingFunction from pytidb.datatype import TEXT class Document(TableModel): __tablename__ = "sample_documents" id: int = Field(primary_key=True) content: str = Field(sa_type=TEXT) embedding: list[float] = EmbeddingFunction( model_name="openai/text-embedding-3-small" ).VectorField(source_field="content") table = tidb_client.create_table(schema=Document, if_exists="overwrite")
        CREATE TABLE sample_documents ( `id` INT PRIMARY KEY, `content` TEXT, `embedding` VECTOR(1536) GENERATED ALWAYS AS (EMBED_TEXT( "openai/text-embedding-3-small", `content` )) STORED );

        Step 4: Insert data into the table

          Use the table.insert() or table.bulk_insert() API to add data:

          documents = [ Document(id=1, content="Java: Object-oriented language for cross-platform development."), Document(id=2, content="Java coffee: Bold Indonesian beans with low acidity."), Document(id=3, content="Java island: Densely populated, home to Jakarta."), Document(id=4, content="Java's syntax is used in Android apps."), Document(id=5, content="Dark roast Java beans enhance espresso blends."), ] table.bulk_insert(documents)

          Insert data using the INSERT INTO statement:

          INSERT INTO sample_documents (id, content) VALUES (1, "Java: Object-oriented language for cross-platform development."), (2, "Java coffee: Bold Indonesian beans with low acidity."), (3, "Java island: Densely populated, home to Jakarta."), (4, "Java's syntax is used in Android apps."), (5, "Dark roast Java beans enhance espresso blends.");

          Step 5: Search for similar documents

            Use the table.search() API to perform vector search:

            results = table.search("How to start learning Java programming?") \ .limit(2) \ .to_list() print(results)

            Use the VEC_EMBED_COSINE_DISTANCE function to perform vector search with cosine distance:

            SELECT `id`, `content`, VEC_EMBED_COSINE_DISTANCE(embedding, "How to start learning Java programming?") AS _distance FROM sample_documents ORDER BY _distance ASC LIMIT 2;

            Result:

            +------+----------------------------------------------------------------+ | id | content | +------+----------------------------------------------------------------+ | 1 | Java: Object-oriented language for cross-platform development. | | 4 | Java's syntax is used in Android apps. | +------+----------------------------------------------------------------+

            Use Azure OpenAI

            To use OpenAI embedding models on Azure, set the global variable TIDB_EXP_EMBED_OPENAI_API_BASE to the URL of your Azure resource. For example:

            SET @@GLOBAL.TIDB_EXP_EMBED_OPENAI_API_KEY = 'your-openai-api-key-here'; SET @@GLOBAL.TIDB_EXP_EMBED_OPENAI_API_BASE = 'https://<your-resource-name>.openai.azure.com/openai/v1'; CREATE TABLE sample ( `id` INT, `content` TEXT, `embedding` VECTOR(3072) GENERATED ALWAYS AS (EMBED_TEXT( "openai/text-embedding-3-large", `content` )) STORED ); INSERT INTO sample (`id`, `content`) VALUES (1, "Java: Object-oriented language for cross-platform development."), (2, "Java coffee: Bold Indonesian beans with low acidity."), (3, "Java island: Densely populated, home to Jakarta."), (4, "Java's syntax is used in Android apps."), (5, "Dark roast Java beans enhance espresso blends."); SELECT `id`, `content` FROM sample ORDER BY VEC_EMBED_COSINE_DISTANCE( embedding, "How to start learning Java programming?" ) LIMIT 2;

            Even if your resource URL appears as https://<your-resource-name>.cognitiveservices.azure.com/, you still need to use https://<your-resource-name>.openai.azure.com/openai/v1 as the API base to keep OpenAI-compatible request and response formats.

            To switch from Azure OpenAI to OpenAI directly, set TIDB_EXP_EMBED_OPENAI_API_BASE to an empty string:

            SET @@GLOBAL.TIDB_EXP_EMBED_OPENAI_API_BASE = '';

            Options

            All OpenAI embedding options are supported via the additional_json_options parameter of the EMBED_TEXT() function.

            Example: Use an alternative dimension for text-embedding-3-large

            CREATE TABLE sample ( `id` INT, `content` TEXT, `embedding` VECTOR(1024) GENERATED ALWAYS AS (EMBED_TEXT( "openai/text-embedding-3-large", `content`, '{"dimensions": 1024}' )) STORED );

            For all available options, see OpenAI Documentation.

            See also

            Was this page helpful?