Skip to main content

SQLite as a Vector Store with SQLiteVec

This notebook covers how to get started with the SQLiteVec vector store.

SQLite-Vec is an SQLite extension designed for vector search, emphasizing local-first operations and easy integration into applications without external servers. It is the successor to SQLite-VSS by the same author. It is written in zero-dependency C and designed to be easy to build and use.

This notebook shows how to use the SQLiteVec vector database.

Setupโ€‹

You'll need to install langchain-community with pip install -qU langchain-community to use this integration

# You need to install sqlite-vec as a dependency.
%pip install --upgrade --quiet sqlite-vec

Credentialsโ€‹

SQLiteVec does not require any credentials to use as the vector store is a simple SQLite file.

Initializationโ€‹

from langchain_community.embeddings.sentence_transformer import (
SentenceTransformerEmbeddings,
)
from langchain_community.vectorstores import SQLiteVec

embedding_function = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")
vector_store = SQLiteVec(
table="state_union", db_file="/tmp/vec.db", embedding=embedding_function
)

Manage vector storeโ€‹

Add items to vector storeโ€‹

vector_store.add_texts(texts=["Ketanji Brown Jackson is awesome", "foo", "bar"])

Update items in vector storeโ€‹

Not supported yet

Delete items from vector storeโ€‹

Not supported yet

Query vector storeโ€‹

Query directlyโ€‹

data = vector_store.similarity_search("Ketanji Brown Jackson", k=4)

Query by turning into retrieverโ€‹

Not supported yet

Usage for retrieval-augmented generationโ€‹

Refer to the documentation on sqlite-vec at https://alexgarcia.xyz/sqlite-vec/ for more information on how to use it for retrieval-augmented generation.

API referenceโ€‹

For detailed documentation of all SQLiteVec features and configurations head to the API reference:https://api.python.langchain.com/en/latest/vectorstores/langchain_community.vectorstores.sqlitevec.SQLiteVec.html

Other examplesโ€‹

from langchain_community.document_loaders import TextLoader
from langchain_community.embeddings.sentence_transformer import (
SentenceTransformerEmbeddings,
)
from langchain_community.vectorstores import SQLiteVec
from langchain_text_splitters import CharacterTextSplitter

# load the document and split it into chunks
loader = TextLoader("../../how_to/state_of_the_union.txt")
documents = loader.load()

# split it into chunks
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)
texts = [doc.page_content for doc in docs]


# create the open-source embedding function
embedding_function = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")


# load it in sqlite-vss in a table named state_union.
# the db_file parameter is the name of the file you want
# as your sqlite database.
db = SQLiteVec.from_texts(
texts=texts,
embedding=embedding_function,
table="state_union",
db_file="/tmp/vec.db",
)

# query it
query = "What did the president say about Ketanji Brown Jackson"
data = db.similarity_search(query)

# print results
data[0].page_content
'Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youโ€™re at it, pass the Disclose Act so Americans can know who is funding our elections. \n\nTonight, Iโ€™d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyerโ€”an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nationโ€™s top legal minds, who will continue Justice Breyerโ€™s legacy of excellence.'

Example using existing SQLite connectionโ€‹

from langchain_community.document_loaders import TextLoader
from langchain_community.embeddings.sentence_transformer import (
SentenceTransformerEmbeddings,
)
from langchain_community.vectorstores import SQLiteVec
from langchain_text_splitters import CharacterTextSplitter

# load the document and split it into chunks
loader = TextLoader("../../how_to/state_of_the_union.txt")
documents = loader.load()

# split it into chunks
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)
texts = [doc.page_content for doc in docs]


# create the open-source embedding function
embedding_function = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")
connection = SQLiteVec.create_connection(db_file="/tmp/vec.db")

db1 = SQLiteVec(
table="state_union", embedding=embedding_function, connection=connection
)

db1.add_texts(["Ketanji Brown Jackson is awesome"])
# query it again
query = "What did the president say about Ketanji Brown Jackson"
data = db1.similarity_search(query)

# print results
data[0].page_content
'Ketanji Brown Jackson is awesome'

Was this page helpful?


You can also leave detailed feedback on GitHub.