NanoPQ (产品量化)

产品量化算法 (k-NN) 简要介绍是一种量化算法，帮助压缩数据库向量，在涉及大数据集时有助于语义搜索。简而言之，嵌入被分割成 M 个子空间，随后进行聚类。在聚类向量后，质心向量被映射到每个子空间聚类中的向量。

本笔记本介绍了如何使用一个检索器，该检索器在底层使用了由nanopq包实现的产品量化。

%pip install -qU langchain-community langchain-openai nanopq

<!--IMPORTS:[{"imported": "SpacyEmbeddings", "source": "langchain_community.embeddings.spacy_embeddings", "docs": "https://python.langchain.com/api_reference/community/embeddings/langchain_community.embeddings.spacy_embeddings.SpacyEmbeddings.html", "title": "NanoPQ (Product Quantization)"}, {"imported": "NanoPQRetriever", "source": "langchain_community.retrievers", "docs": "https://python.langchain.com/api_reference/community/retrievers/langchain_community.retrievers.nanopq.NanoPQRetriever.html", "title": "NanoPQ (Product Quantization)"}]-->
from langchain_community.embeddings.spacy_embeddings import SpacyEmbeddings
from langchain_community.retrievers import NanoPQRetriever

创建新的检索器与文本

retriever = NanoPQRetriever.from_texts(
    ["Great world", "great words", "world", "planets of the world"],
    SpacyEmbeddings(model_name="en_core_web_sm"),
    clusters=2,
    subspace=2,
)

使用检索器

我们现在可以使用检索器了！

retriever.invoke("earth")

M: 2, Ks: 2, metric : <class 'numpy.uint8'>, code_dtype: l2
iter: 20, seed: 123
Training the subspace: 0 / 2
Training the subspace: 1 / 2
Encoding the subspace: 0 / 2
Encoding the subspace: 1 / 2

[Document(page_content='world'),
 Document(page_content='Great world'),
 Document(page_content='great words'),
 Document(page_content='planets of the world')]

NanoPQ (产品量化)

创建新的检索器与文本

使用检索器

相关

Was this page helpful?

You can also leave detailed feedback on GitHub.

创建新的检索器与文本​

使用检索器​

相关​

Was this page helpful?

You can also leave detailed feedback on GitHub.

创建新的检索器与文本

使用检索器

相关