Auto Embedding Example
This example shows how to use the Auto Embedding feature with the pytidb client.
- Connect to TiDB using the
pytidbclient. - Define a table with a VectorField configured for automatic embedding.
- Insert plain text data: embeddings are populated automatically in the background.
- Run vector searches with natural-language queries: embeddings are generated transparently.
Prerequisites
Before you begin, ensure you have the following:
- Python (>=3.10): Install Python 3.10 or a later version.
- A TiDB Cloud Starter cluster: You can create a free TiDB cluster on TiDB Cloud.
How to run
Step 1. Clone the pytidb repository
git clone https://github.com/pingcap/pytidb.git
cd pytidb/examples/auto_embedding/
Step 2. Install the required packages
python -m venv .venv
source .venv/bin/activate
pip install -r reqs.txt
Step 3. Set environment variables
- In the TiDB Cloud console, navigate to the Clusters page, and then click the name of your target cluster to go to its overview page.
- Click Connect in the upper-right corner. A connection dialog is displayed, with connection parameters listed.
- Set environment variables according to the connection parameters as follows:
cat > .env <<EOF
TIDB_HOST={gateway-region}.prod.aws.tidbcloud.com
TIDB_PORT=4000
TIDB_USERNAME={prefix}.root
TIDB_PASSWORD={password}
TIDB_DATABASE=test
# Using TiDB Cloud Free embedding model by default, which does not require setting up any API key
EMBEDDING_PROVIDER=tidbcloud_free
EOF
Step 4. Run the demo
python main.py
Expected output:
=== Define embedding function ===
Embedding function (model id: tidbcloud_free/amazon/titan-embed-text-v2) defined
=== Define table schema ===
Table created
=== Truncate table ===
Table truncated
=== Insert sample data ===
Inserted 3 chunks
=== Perform vector search ===
id: 1, text: TiDB is a distributed database that supports OLTP, OLAP, HTAP and AI workloads., distance: 0.30373281240458805
id: 2, text: PyTiDB is a Python library for developers to connect to TiDB., distance: 0.422506501973434
id: 3, text: LlamaIndex is a Python library for building AI-powered applications., distance: 0.5267239638442787
Related resources
- Source Code: View on GitHub