Datasets

Install VecDB

pip install vecdb

Login and create dataset

Python
import relevanceai as rai
rai.login()

ds = rai.Dataset(
    id = "hello_world"
)

Different ways to insert data into your dataset:

Automatically vectorizes the text field:

Python
ds.insert(
    documents=[
        {"_id": "doc1", "text": "tiger", "category": "land"},
        {"_id": "doc2", "text": "lion", "category": "land"},
        {"_id": "doc3", "text": "tigr", "category": "land"},
        {"_id": "doc4", "text": "car", "category": "land"},
    ]
)

Now its ready for search:

Python
ds.search(text="vroom", page_size=1)

Changing models

Lets change the model to “text-embedding-ada-002”:

Python
ds.insert(
    documents=[
        {"_id": "doc1", "text": "tiger", "category": "land"},
        {"_id": "doc2", "text": "lion", "category": "land"},
        {"_id": "doc3", "text": "tigr", "category": "land"},
        {"_id": "doc4", "text": "car", "category": "land"},
    ],
    encoders=[{
        "model_name":"text-embedding-ada-002",
        "field" : "text"
    }]
)

We support these models [‘image_text’, ‘text_image’, ‘all-mpnet-base-v2’, ‘clip-vit-b-32-image’, ‘clip-vit-b-32-text’, ‘clip-vit-l-14-image’, ‘clip-vit-l-14-text’, ‘sentence-transformers’, ‘text-embedding-ada-002’, ‘cohere-small’, ‘cohere-large’, ‘cohere-multilingual-22-12’]:

Python
ds.search(text="vroom", model="text-embedding-ada-002")

Bring your own vectors:

Python
ds.insert(
    documents=[
        {"_id": "doc1", "text": "tiger", "category": "land", "text_vector_" : [0.1, 0.1, 0.1]},
        {"_id": "doc2", "text": "lion", "category": "land", "text_vector_" : [0.2, 0.2, 0.2]},
        {"_id": "doc3", "text": "tigr", "category": "land", "text_vector_" : [0.3, 0.3, 0.3]},
        {"_id": "doc4", "text": "car", "category": "land", "text_vector_" : [0.4, 0.4, 0.4]},
    ]
)

Its ready for search:

Python
ds.search(
    vector=[0.1, 0.1, 0.1]
)

Turn the search into a step

Note this only works for the hosted embeddings. To do this for your own embeddings it’ll have to be done outside of the chain.

Python
from relevanceai.params import StringParam

query = StringParam("query")
ds.search(
    text=query,
    return_as_step=True
)

You can list your datasets

Python
from relevanceai import rai
rai.list_datasets()

Note: This search can’t be added to a chain yet. This search can do a lot of things from searching with vectors.

Python
ds.db.search(...)