📣
TiDB Cloud Essential is now in public preview. Try it out →

Auto Embedding Example



This example shows how to use the Auto Embedding feature with the pytidb client.

  1. Connect to TiDB using the pytidb client.
  2. Define a table with a VectorField configured for automatic embedding.
  3. Insert plain text data: embeddings are populated automatically in the background.
  4. Run vector searches with natural-language queries: embeddings are generated transparently.

Prerequisites

Before you begin, ensure you have the following:

  • Python (>=3.10): Install Python 3.10 or a later version.
  • A TiDB Cloud Starter cluster: You can create a free TiDB cluster on TiDB Cloud.

How to run

Step 1. Clone the pytidb repository

git clone https://github.com/pingcap/pytidb.git cd pytidb/examples/auto_embedding/

Step 2. Install the required packages

python -m venv .venv source .venv/bin/activate pip install -r reqs.txt

Step 3. Set environment variables

  1. In the TiDB Cloud console, navigate to the Clusters page, and then click the name of your target cluster to go to its overview page.
  2. Click Connect in the upper-right corner. A connection dialog is displayed, with connection parameters listed.
  3. Set environment variables according to the connection parameters as follows:
cat > .env <<EOF TIDB_HOST={gateway-region}.prod.aws.tidbcloud.com TIDB_PORT=4000 TIDB_USERNAME={prefix}.root TIDB_PASSWORD={password} TIDB_DATABASE=test # Using TiDB Cloud Free embedding model by default, which does not require setting up any API key EMBEDDING_PROVIDER=tidbcloud_free EOF

Step 4. Run the demo

python main.py

Expected output:

=== Define embedding function === Embedding function (model id: tidbcloud_free/amazon/titan-embed-text-v2) defined === Define table schema === Table created === Truncate table === Table truncated === Insert sample data === Inserted 3 chunks === Perform vector search === id: 1, text: TiDB is a distributed database that supports OLTP, OLAP, HTAP and AI workloads., distance: 0.30373281240458805 id: 2, text: PyTiDB is a Python library for developers to connect to TiDB., distance: 0.422506501973434 id: 3, text: LlamaIndex is a Python library for building AI-powered applications., distance: 0.5267239638442787

Was this page helpful?