Pinecone
Pinecone 是一个功能广泛的向量数据库。
在本教程中,我们将演示使用 Pinecone
向量存储的 SelfQueryRetriever
。
创建一个 Pinecone 索引
首先,我们需要创建一个 Pinecone
向量存储,并用一些数据进行初始化。我们创建了一小组包含电影摘要的演示文档。
要使用 Pinecone,您必须安装 pinecone
包,并且必须拥有 API 密钥和环境。以下是 安装说明。
注意: 自查询检索器要求您安装 lark
包。
%pip install --upgrade --quiet lark
%pip install --upgrade --quiet pinecone-notebooks pinecone-client==3.2.2
# Connect to Pinecone and get an API key.
from pinecone_notebooks.colab import Authenticate
Authenticate()
import os
api_key = os.environ["PINECONE_API_KEY"]
/Users/harrisonchase/.pyenv/versions/3.9.1/envs/langchain/lib/python3.9/site-packages/pinecone/index.py:4: TqdmExperimentalWarning: Using `tqdm.autonotebook.tqdm` in notebook mode. Use `tqdm.tqdm` instead to force console mode (e.g. in jupyter console)
from tqdm.autonotebook import tqdm
我们想使用 OpenAIEmbeddings
,所以我们必须获取 OpenAI API 密钥。
import getpass
if "OPENAI_API_KEY" not in os.environ:
os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")
from pinecone import Pinecone, ServerlessSpec
api_key = os.getenv("PINECONE_API_KEY") or "PINECONE_API_KEY"
index_name = "langchain-self-retriever-demo"
pc = Pinecone(api_key=api_key)
<!--IMPORTS:[{"imported": "Document", "source": "langchain_core.documents", "docs": "https://python.langchain.com/api_reference/core/documents/langchain_core.documents.base.Document.html", "title": "Pinecone"}, {"imported": "OpenAIEmbeddings", "source": "langchain_openai", "docs": "https://python.langchain.com/api_reference/openai/embeddings/langchain_openai.embeddings.base.OpenAIEmbeddings.html", "title": "Pinecone"}, {"imported": "PineconeVectorStore", "source": "langchain_pinecone", "docs": "https://python.langchain.com/api_reference/pinecone/vectorstores/langchain_pinecone.vectorstores.PineconeVectorStore.html", "title": "Pinecone"}]-->
from langchain_core.documents import Document
from langchain_openai import OpenAIEmbeddings
from langchain_pinecone import PineconeVectorStore
embeddings = OpenAIEmbeddings()
# create new index
if index_name not in pc.list_indexes().names():
pc.create_index(
name=index_name,
dimension=1536,
metric="cosine",
spec=ServerlessSpec(cloud="aws", region="us-east-1"),
)
docs = [
Document(
page_content="A bunch of scientists bring back dinosaurs and mayhem breaks loose",
metadata={"year": 1993, "rating": 7.7, "genre": ["action", "science fiction"]},
),
Document(
page_content="Leo DiCaprio gets lost in a dream within a dream within a dream within a ...",
metadata={"year": 2010, "director": "Christopher Nolan", "rating": 8.2},
),
Document(
page_content="A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea",
metadata={"year": 2006, "director": "Satoshi Kon", "rating": 8.6},
),
Document(
page_content="A bunch of normal-sized women are supremely wholesome and some men pine after them",
metadata={"year": 2019, "director": "Greta Gerwig", "rating": 8.3},
),
Document(
page_content="Toys come alive and have a blast doing so",
metadata={"year": 1995, "genre": "animated"},
),
Document(
page_content="Three men walk into the Zone, three men walk out of the Zone",
metadata={
"year": 1979,
"director": "Andrei Tarkovsky",
"genre": ["science fiction", "thriller"],
"rating": 9.9,
},
),
]
vectorstore = PineconeVectorStore.from_documents(
docs, embeddings, index_name="langchain-self-retriever-demo"
)