Hybrid Search Example
This demo shows how to combine vector search and full-text search to improve the retrieval quality over a document set.

TiDB Hybrid Search Demo
Prerequisites
Before you begin, ensure you have the following:
- Python (>=3.10): Install Python 3.10 or a later version.
- A TiDB Cloud Starter cluster: You can create a free TiDB cluster on TiDB Cloud.
- OpenAI API key: Get an OpenAI API key from OpenAI.
How to run
Step 1. Clone the pytidb repository
pytidb is the official Python SDK for TiDB, designed to help developers build AI applications efficiently.
git clone https://github.com/pingcap/pytidb.git
cd pytidb/examples/hybrid_search
Step 2. Install the required packages and set up the environment
python -m venv .venv
source .venv/bin/activate
pip install -r reqs.txt
Step 3. Set environment variables
- In the TiDB Cloud console, navigate to the Clusters page, and then click the name of your target cluster to go to its overview page.
- Click Connect in the upper-right corner. A connection dialog is displayed, with connection parameters listed.
- Set environment variables according to the connection parameters as follows:
cat > .env <<EOF
TIDB_HOST={gateway-region}.prod.aws.tidbcloud.com
TIDB_PORT=4000
TIDB_USERNAME={prefix}.root
TIDB_PASSWORD={password}
TIDB_DATABASE=pytidb_hybrid_demo
OPENAI_API_KEY=<your-openai-api-key>
EOF
Step 4. Run the demo
Option 1. Run the Streamlit app
If you want to check the demo with a web UI, you can run the following command:
streamlit run app.py
Open your browser and visit http://localhost:8501.
Option 2. Run the demo script
If you want to check the demo with a script, you can run the following command:
python example.py
Expected output:
=== CONNECT TO TIDB ===
Connected to TiDB.
=== CREATE TABLE ===
Table created.
=== INSERT SAMPLE DATA ===
Inserted 3 rows.
=== PERFORM HYBRID SEARCH ===
Search results:
[
{
"_distance": 0.4740166257687124,
"_match_score": 1.6804268,
"_score": 0.03278688524590164,
"id": 60013,
"text": "TiDB is a distributed database that supports OLTP, OLAP, HTAP and AI workloads."
},
{
"_distance": 0.6428459116216618,
"_match_score": 0.78427225,
"_score": 0.03200204813108039,
"id": 60015,
"text": "LlamaIndex is a Python library for building AI-powered applications."
},
{
"_distance": 0.641581407158715,
"_match_score": null,
"_score": 0.016129032258064516,
"id": 60014,
"text": "PyTiDB is a Python library for developers to connect to TiDB."
}
]
Related resources
- Source Code: View on GitHub