Jina AI 向量嵌入
本文档介绍如何在 TiDB Cloud 中结合 Auto Embedding 使用 Jina AI 向量嵌入模型,从文本查询中执行语义搜索。
可用模型
如果你自带 Jina AI API 密钥(BYOK),则所有 Jina AI 模型均可通过 jina_ai/ 前缀使用。例如:
jina-embeddings-v4
- 名称:
jina_ai/jina-embeddings-v4 - 维度:2048
- 距离度量:Cosine,L2
- 最大输入文本 token 数:32,768
- 价格:由 Jina AI 收费
- 由 TiDB Cloud 托管:❌
- 支持自带密钥:✅
jina-embeddings-v3
- 名称:
jina_ai/jina-embeddings-v3 - 维度:1024
- 距离度量:Cosine,L2
- 最大输入文本 token 数:8,192
- 价格:由 Jina AI 收费
- 由 TiDB Cloud 托管:❌
- 支持自带密钥:✅
完整可用模型列表请参见 Jina AI Documentation。
SQL 使用示例
要使用 Jina AI 模型,你必须按如下方式指定 Jina AI API 密钥:
SET @@GLOBAL.TIDB_EXP_EMBED_JINA_AI_API_KEY = 'your-jina-ai-api-key-here';
CREATE TABLE sample (
`id` INT,
`content` TEXT,
`embedding` VECTOR(2048) GENERATED ALWAYS AS (EMBED_TEXT(
"jina_ai/jina-embeddings-v4",
`content`
)) STORED
);
INSERT INTO sample
(`id`, `content`)
VALUES
(1, "Java: Object-oriented language for cross-platform development."),
(2, "Java coffee: Bold Indonesian beans with low acidity."),
(3, "Java island: Densely populated, home to Jakarta."),
(4, "Java's syntax is used in Android apps."),
(5, "Dark roast Java beans enhance espresso blends.");
SELECT `id`, `content` FROM sample
ORDER BY
VEC_EMBED_COSINE_DISTANCE(
embedding,
"How to start learning Java programming?"
)
LIMIT 2;
结果:
+------+----------------------------------------------------------------+
| id | content |
+------+----------------------------------------------------------------+
| 1 | Java: Object-oriented language for cross-platform development. |
| 4 | Java's syntax is used in Android apps. |
+------+----------------------------------------------------------------+
选项
所有 Jina AI 选项 均可通过 EMBED_TEXT() 函数的 additional_json_options 参数进行支持。
示例:为更优性能指定“下游任务”
CREATE TABLE sample (
`id` INT,
`content` TEXT,
`embedding` VECTOR(2048) GENERATED ALWAYS AS (EMBED_TEXT(
"jina_ai/jina-embeddings-v4",
`content`,
'{"task": "retrieval.passage", "task@search": "retrieval.query"}'
)) STORED
);
示例:使用其他维度
CREATE TABLE sample (
`id` INT,
`content` TEXT,
`embedding` VECTOR(768) GENERATED ALWAYS AS (EMBED_TEXT(
"jina_ai/jina-embeddings-v3",
`content`,
'{"dimensions":768}'
)) STORED
);
所有可用选项请参见 Jina AI Documentation。