📣
TiDB Cloud Essential 开放公测中。此页面由 AI 自动翻译,英文原文请见此处。

集成 TiDB 向量检索与 Amazon Bedrock

本教程演示如何将 TiDB 的 向量检索 功能与 Amazon Bedrock 集成,构建基于检索增强生成(RAG)的问答机器人。

前置条件

完成本教程,你需要:

开始使用

本节将为你提供将 TiDB 向量检索与 Amazon Bedrock 集成,构建基于 RAG 的问答机器人的分步指南。

步骤 1. 设置环境变量

TiDB Cloud 控制台 获取 TiDB 连接信息,并在你的开发环境中设置环境变量,操作如下:

  1. 进入 Clusters 页面,点击目标集群名称,进入集群概览页。

  2. 点击右上角的 Connect,弹出连接对话框。

  3. 确保连接对话框中的配置与你的操作环境一致。

    • Connection Type 设置为 Public
    • Branch 设置为 main
    • Connect With 设置为 General
    • Operating System 与你的环境一致
  4. 点击 Generate Password 生成随机密码。

  5. 在终端中运行以下命令设置环境变量。你需要将命令中的占位符替换为连接对话框中获取的对应参数。

    export TIDB_HOST=<your-tidb-host> export TIDB_PORT=4000 export TIDB_USER=<your-tidb-user> export TIDB_PASSWORD=<your-tidb-password> export TIDB_DB_NAME=test

步骤 2. 配置 Python 虚拟环境

  1. 创建一个名为 demo.py 的 Python 文件:

    touch demo.py
  2. 创建并激活虚拟环境以管理依赖:

    python3 -m venv env source env/bin/activate # 在 Windows 上使用 env\Scripts\activate
  3. 安装所需依赖:

    pip install SQLAlchemy==2.0.30 PyMySQL==1.1.0 tidb-vector==0.0.9 pydantic==2.7.1 boto3

步骤 3. 导入所需库

demo.py 文件开头添加以下代码以导入所需库:

import os import json import boto3 from sqlalchemy import Column, Integer, Text, create_engine from sqlalchemy.orm import declarative_base, Session from tidb_vector.sqlalchemy import VectorType

步骤 4. 配置数据库连接

demo.py 中添加以下代码以配置数据库连接:

# ---- Configuration Setup ---- # Set environment variables: TIDB_HOST, TIDB_PORT, TIDB_USER, TIDB_PASSWORD, TIDB_DB_NAME TIDB_HOST = os.environ.get("TIDB_HOST") TIDB_PORT = os.environ.get("TIDB_PORT") TIDB_USER = os.environ.get("TIDB_USER") TIDB_PASSWORD = os.environ.get("TIDB_PASSWORD") TIDB_DB_NAME = os.environ.get("TIDB_DB_NAME") # ---- Database Setup ---- def get_db_url(): """Build the database connection URL.""" return f"mysql+pymysql://{TIDB_USER}:{TIDB_PASSWORD}@{TIDB_HOST}:{TIDB_PORT}/{TIDB_DB_NAME}?ssl_verify_cert=True&ssl_verify_identity=True" # Create engine engine = create_engine(get_db_url(), pool_recycle=300) Base = declarative_base()

步骤 5. 通过 Bedrock runtime client 调用 Amazon Titan Text Embeddings V2 模型

Amazon Bedrock runtime client 提供了 invoke_model API,接受以下参数:

  • modelId:Amazon Bedrock 可用基础模型的模型 ID
  • accept:输入请求的类型
  • contentType:输入内容的类型
  • body:包含 prompt 和配置的 JSON 字符串负载

demo.py 中添加以下代码,通过 invoke_model API 使用 Amazon Titan Text Embeddings 生成文本向量,并通过 Meta Llama 3 获取响应:

# Bedrock Runtime Client Setup bedrock_runtime = boto3.client('bedrock-runtime') # ---- Model Invocation ---- embedding_model_name = "amazon.titan-embed-text-v2:0" dim_of_embedding_model = 512 llm_name = "us.meta.llama3-2-3b-instruct-v1:0" def embedding(content): """Invoke Amazon Bedrock to get text embeddings.""" payload = { "modelId": embedding_model_name, "contentType": "application/json", "accept": "*/*", "body": { "inputText": content, "dimensions": dim_of_embedding_model, "normalize": True, } } body_bytes = json.dumps(payload['body']).encode('utf-8') response = bedrock_runtime.invoke_model( body=body_bytes, contentType=payload['contentType'], accept=payload['accept'], modelId=payload['modelId'] ) result_body = json.loads(response.get("body").read()) return result_body.get("embedding") def generate_result(query: str, info_str: str): """Generate answer using Meta Llama 3 model.""" prompt = f""" ONLY use the content below to generate an answer: {info_str} ---- Please carefully think about the question: {query} """ payload = { "modelId": llm_name, "contentType": "application/json", "accept": "application/json", "body": { "prompt": prompt, "temperature": 0 } } body_bytes = json.dumps(payload['body']).encode('utf-8') response = bedrock_runtime.invoke_model( body=body_bytes, contentType=payload['contentType'], accept=payload['accept'], modelId=payload['modelId'] ) result_body = json.loads(response.get("body").read()) completion = result_body["generation"] return completion

步骤 6. 创建向量表

demo.py 中添加以下代码,创建用于存储文本及其向量的向量表:

# ---- TiDB Setup and Vector Index Creation ---- class Entity(Base): """Define the Entity table with a vector index.""" __tablename__ = "entity" id = Column(Integer, primary_key=True) content = Column(Text) content_vec = Column(VectorType(dim=dim_of_embedding_model), comment="hnsw(distance=l2)") # Create the table in TiDB Base.metadata.create_all(engine)

步骤 7. 将向量数据保存到 TiDB Cloud Starter

demo.py 中添加以下代码,将向量数据保存到你的 TiDB Cloud Starter 集群:

# ---- Saving Vectors to TiDB ---- def save_entities_with_embedding(session, contents): """Save multiple entities with their embeddings to the TiDB database.""" for content in contents: entity = Entity(content=content, content_vec=embedding(content)) session.add(entity) session.commit()

步骤 8. 运行应用

  1. demo.py 中添加以下代码,建立数据库会话,将向量保存到 TiDB,提出示例问题(如 "What is TiDB?"),并从模型生成结果:

    if __name__ == "__main__": # Establish a database session with Session(engine) as session: # Example data contents = [ "TiDB is a distributed SQL database compatible with MySQL.", "TiDB supports Hybrid Transactional and Analytical Processing (HTAP).", "TiDB can scale horizontally and provides high availability.", "Amazon Bedrock allows seamless integration with foundation models.", "Meta Llama 3 is a powerful model for text generation." ] # Save embeddings to TiDB save_entities_with_embedding(session, contents) # Example query query = "What is TiDB?" info_str = " ".join(contents) # Generate result from Meta Llama 3 result = generate_result(query, info_str) print(f"Generated answer: {result}")
  2. 保存所有对 demo.py 的更改并运行脚本:

    python3 demo.py

    预期输出类似如下:

    Generated answer: What is the main purpose of TiDB? What are the key features of TiDB? What are the key benefits of TiDB? ---- Based on the provided text, here is the answer to the question: What is TiDB? TiDB is a distributed SQL database compatible with MySQL. ## Step 1: Understand the question The question asks for the definition of TiDB. ## Step 2: Identify the key information The key information provided in the text is that TiDB is a distributed SQL database compatible with MySQL. ## Step 3: Provide the answer Based on the provided text, TiDB is a distributed SQL database compatible with MySQL. The final answer is: TiDB is a distributed SQL database compatible with MySQL.

参见

文档内容是否有帮助?