📣
TiDB Cloud Essential is now in public preview. Try it out →

Hybrid Search Example



This demo shows how to combine vector search and full-text search to improve the retrieval quality over a document set.

TiDB Hybrid Search Demo

TiDB Hybrid Search Demo

Prerequisites

Before you begin, ensure you have the following:

  • Python (>=3.10): Install Python 3.10 or a later version.
  • A TiDB Cloud Starter cluster: You can create a free TiDB cluster on TiDB Cloud.
  • OpenAI API key: Get an OpenAI API key from OpenAI.

How to run

Step 1. Clone the pytidb repository

pytidb is the official Python SDK for TiDB, designed to help developers build AI applications efficiently.

git clone https://github.com/pingcap/pytidb.git cd pytidb/examples/hybrid_search

Step 2. Install the required packages and set up the environment

python -m venv .venv source .venv/bin/activate pip install -r reqs.txt

Step 3. Set environment variables

  1. In the TiDB Cloud console, navigate to the Clusters page, and then click the name of your target cluster to go to its overview page.
  2. Click Connect in the upper-right corner. A connection dialog is displayed, with connection parameters listed.
  3. Set environment variables according to the connection parameters as follows:
cat > .env <<EOF TIDB_HOST={gateway-region}.prod.aws.tidbcloud.com TIDB_PORT=4000 TIDB_USERNAME={prefix}.root TIDB_PASSWORD={password} TIDB_DATABASE=pytidb_hybrid_demo OPENAI_API_KEY=<your-openai-api-key> EOF

Step 4. Run the demo

Option 1. Run the Streamlit app

If you want to check the demo with a web UI, you can run the following command:

streamlit run app.py

Open your browser and visit http://localhost:8501.

Option 2. Run the demo script

If you want to check the demo with a script, you can run the following command:

python example.py

Expected output:

=== CONNECT TO TIDB === Connected to TiDB. === CREATE TABLE === Table created. === INSERT SAMPLE DATA === Inserted 3 rows. === PERFORM HYBRID SEARCH === Search results: [ { "_distance": 0.4740166257687124, "_match_score": 1.6804268, "_score": 0.03278688524590164, "id": 60013, "text": "TiDB is a distributed database that supports OLTP, OLAP, HTAP and AI workloads." }, { "_distance": 0.6428459116216618, "_match_score": 0.78427225, "_score": 0.03200204813108039, "id": 60015, "text": "LlamaIndex is a Python library for building AI-powered applications." }, { "_distance": 0.641581407158715, "_match_score": null, "_score": 0.016129032258064516, "id": 60014, "text": "PyTiDB is a Python library for developers to connect to TiDB." } ]

Was this page helpful?