Skip to main content

图像标题

默认情况下,加载器使用预训练的 Salesforce BLIP 图像标题生成模型

本笔记本展示了如何使用 ImageCaptionLoader 生成可查询的图像标题索引。

%pip install -qU transformers langchain_openai langchain_chroma

import getpass
import os

os.environ["OPENAI_API_KEY"] = getpass.getpass()

准备一份来自维基媒体的图片网址列表

<!--IMPORTS:[{"imported": "ImageCaptionLoader", "source": "langchain_community.document_loaders", "docs": "https://python.langchain.com/api_reference/community/document_loaders/langchain_community.document_loaders.image_captions.ImageCaptionLoader.html", "title": "Image captions"}]-->
from langchain_community.document_loaders import ImageCaptionLoader

list_image_urls = [
"https://upload.wikimedia.org/wikipedia/commons/thumb/e/ec/Ara_ararauna_Luc_Viatour.jpg/1554px-Ara_ararauna_Luc_Viatour.jpg",
"https://upload.wikimedia.org/wikipedia/commons/thumb/0/0c/1928_Model_A_Ford.jpg/640px-1928_Model_A_Ford.jpg",
]

创建加载器

loader = ImageCaptionLoader(images=list_image_urls)
list_docs = loader.load()
list_docs
[Document(metadata={'image_path': 'https://upload.wikimedia.org/wikipedia/commons/thumb/e/ec/Ara_ararauna_Luc_Viatour.jpg/1554px-Ara_ararauna_Luc_Viatour.jpg'}, page_content='an image of a bird flying in the air [SEP]'),
Document(metadata={'image_path': 'https://upload.wikimedia.org/wikipedia/commons/thumb/0/0c/1928_Model_A_Ford.jpg/640px-1928_Model_A_Ford.jpg'}, page_content='an image of a vintage car parked on the street [SEP]')]
import requests
from PIL import Image

Image.open(requests.get(list_image_urls[0], stream=True).raw).convert("RGB")