Jina AI Embeddings
This document describes how to use Jina AI embedding models with Auto Embedding in TiDB Cloud to perform semantic searches with text queries.
Available models
Jina AI provides high-performance, multimodal, and multilingual long-context embeddings for search, RAG, and agent applications.
All Jina AI models are available for use with the jina_ai/ prefix if you bring your own Jina AI API key (BYOK). For example:
jina-embeddings-v4
- Name:
jina_ai/jina-embeddings-v4 - Dimensions: 2048
- Distance metric: Cosine, L2
- Maximum input text tokens: 32,768
- Price: Charged by Jina AI
- Hosted by TiDB Cloud: ❌
- Bring Your Own Key: ✅
jina-embeddings-v3
- Name:
jina_ai/jina-embeddings-v3 - Dimensions: 1024
- Distance metric: Cosine, L2
- Maximum input text tokens: 8,192
- Price: Charged by Jina AI
- Hosted by TiDB Cloud: ❌
- Bring Your Own Key: ✅
For a full list of available models, see Jina AI Documentation.
Usage example
This example shows how to create a vector table, insert documents, and run a similarity search using Jina AI embedding models.
Step 1: Connect to the database
from pytidb import TiDBClient
tidb_client = TiDBClient.connect(
host="{gateway-region}.prod.aws.tidbcloud.com",
port=4000,
username="{prefix}.root",
password="{password}",
database="{database}",
ensure_db=True,
)
mysql -h {gateway-region}.prod.aws.tidbcloud.com \
-P 4000 \
-u {prefix}.root \
-p{password} \
-D {database}
Step 2: Configure the API key
Create your API key from the Jina AI Platform and bring your own key (BYOK) to use the embedding service.
Configure the API key for the Jina AI embedding provider using the TiDB Client:
tidb_client.configure_embedding_provider(
provider="jina_ai",
api_key="{your-jina-api-key}",
)
Set the API key for the Jina AI embedding provider using SQL:
SET @@GLOBAL.TIDB_EXP_EMBED_JINA_AI_API_KEY = "{your-jina-api-key}";
Step 3: Create a vector table
Create a table with a vector field that uses the jina_ai/jina-embeddings-v4 model to generate 2048-dimensional vectors:
from pytidb.schema import TableModel, Field
from pytidb.embeddings import EmbeddingFunction
from pytidb.datatype import TEXT
class Document(TableModel):
__tablename__ = "sample_documents"
id: int = Field(primary_key=True)
content: str = Field(sa_type=TEXT)
embedding: list[float] = EmbeddingFunction(
model_name="jina_ai/jina-embeddings-v4"
).VectorField(source_field="content")
table = tidb_client.create_table(schema=Document, if_exists="overwrite")
CREATE TABLE sample_documents (
`id` INT PRIMARY KEY,
`content` TEXT,
`embedding` VECTOR(2048) GENERATED ALWAYS AS (EMBED_TEXT(
"jina_ai/jina-embeddings-v4",
`content`
)) STORED
);
Step 4: Insert data into the table
Use the table.insert() or table.bulk_insert() API to add data:
documents = [
Document(id=1, content="Java: Object-oriented language for cross-platform development."),
Document(id=2, content="Java coffee: Bold Indonesian beans with low acidity."),
Document(id=3, content="Java island: Densely populated, home to Jakarta."),
Document(id=4, content="Java's syntax is used in Android apps."),
Document(id=5, content="Dark roast Java beans enhance espresso blends."),
]
table.bulk_insert(documents)
Insert data using the INSERT INTO statement:
INSERT INTO sample_documents (id, content)
VALUES
(1, "Java: Object-oriented language for cross-platform development."),
(2, "Java coffee: Bold Indonesian beans with low acidity."),
(3, "Java island: Densely populated, home to Jakarta."),
(4, "Java's syntax is used in Android apps."),
(5, "Dark roast Java beans enhance espresso blends.");
Step 5: Search for similar documents
Use the table.search() API to perform vector search:
results = table.search("How to start learning Java programming?") \
.limit(2) \
.to_list()
print(results)
Use the VEC_EMBED_COSINE_DISTANCE function to perform vector search based on cosine distance metric:
SELECT
`id`,
`content`,
VEC_EMBED_COSINE_DISTANCE(embedding, "How to start learning Java programming?") AS _distance
FROM sample_documents
ORDER BY _distance ASC
LIMIT 2;
Result:
+------+----------------------------------------------------------------+
| id | content |
+------+----------------------------------------------------------------+
| 1 | Java: Object-oriented language for cross-platform development. |
| 4 | Java's syntax is used in Android apps. |
+------+----------------------------------------------------------------+
Options
All Jina AI options are supported via the additional_json_options parameter of the EMBED_TEXT() function.
Example: Specify "downstream task" for better performance
CREATE TABLE sample (
`id` INT,
`content` TEXT,
`embedding` VECTOR(2048) GENERATED ALWAYS AS (EMBED_TEXT(
"jina_ai/jina-embeddings-v4",
`content`,
'{"task": "retrieval.passage", "task@search": "retrieval.query"}'
)) STORED
);
Example: Use an alternative dimension
CREATE TABLE sample (
`id` INT,
`content` TEXT,
`embedding` VECTOR(768) GENERATED ALWAYS AS (EMBED_TEXT(
"jina_ai/jina-embeddings-v3",
`content`,
'{"dimensions":768}'
)) STORED
);
For all available options, see Jina AI Documentation.