Auto Embedding 示例

本示例展示如何通过 Auto Embedding 功能，结合 pytidb client 使用 Auto Embedding。

使用 pytidb client 连接 TiDB。
定义一个配置了 Auto Embedding 的 VectorField 的表。
插入纯文本数据：嵌入向量会在后台自动填充。
使用自然语言查询进行向量搜索：嵌入向量会透明地生成。

前置条件

在开始之前，请确保你具备以下条件：

Python (>=3.10)：安装 Python 3.10 或以上版本。
TiDB Cloud Starter 集群：你可以在 TiDB Cloud 上创建一个免费的 TiDB 集群。

运行方法

步骤 1. 克隆 `pytidb` 仓库

git clone https://github.com/pingcap/pytidb.git
cd pytidb/examples/auto_embedding/

步骤 2. 安装所需依赖包

python -m venv .venv
source .venv/bin/activate
pip install -r reqs.txt

步骤 3. 设置环境变量

在 TiDB Cloud 控制台中，进入 Clusters 页面，然后点击目标集群名称，进入其概览页面。
点击右上角的 Connect。会弹出连接对话框，显示连接参数。
根据连接参数如下设置环境变量：

cat > .env <<EOF
TIDB_HOST={gateway-region}.prod.aws.tidbcloud.com
TIDB_PORT=4000
TIDB_USERNAME={prefix}.root
TIDB_PASSWORD={password}
TIDB_DATABASE=test

# 默认使用 TiDB Cloud 免费嵌入模型，无需设置任何 API key
EMBEDDING_PROVIDER=tidbcloud_free
EOF

步骤 4. 运行示例

python main.py

预期输出：

=== Define embedding function ===
Embedding function (model id: tidbcloud_free/amazon/titan-embed-text-v2) defined

=== Define table schema ===
Table created

=== Truncate table ===
Table truncated

=== Insert sample data ===
Inserted 3 chunks

=== Perform vector search ===
id: 1, text: TiDB is a distributed database that supports OLTP, OLAP, HTAP and AI workloads., distance: 0.30373281240458805
id: 2, text: PyTiDB is a Python library for developers to connect to TiDB., distance: 0.422506501973434
id: 3, text: LlamaIndex is a Python library for building AI-powered applications., distance: 0.5267239638442787

Auto Embedding 示例

前置条件

运行方法

步骤 1. 克隆 `pytidb` 仓库

步骤 2. 安装所需依赖包

步骤 3. 设置环境变量

步骤 4. 运行示例

相关资源

文档内容是否有帮助？

Auto Embedding 示例

前置条件

运行方法

步骤 1. 克隆 pytidb 仓库

步骤 2. 安装所需依赖包

步骤 3. 设置环境变量

步骤 4. 运行示例

相关资源

文档内容是否有帮助？

步骤 1. 克隆 `pytidb` 仓库