Skip to main content

OpenClip

OpenClip 是 OpenAI 的 CLIP 的一个源实现。

这些多模态嵌入可以用于嵌入图像或文本。

%pip install --upgrade --quiet  langchain-experimental
%pip install --upgrade --quiet  pillow open_clip_torch torch matplotlib

我们可以查看可用的 CLIP 嵌入模型和检查点列表:

import open_clip

open_clip.list_pretrained()

下面,我测试了一个更大但性能更强的模型,基于表格(这里):

model_name = "ViT-g-14"
checkpoint = "laion2b_s34b_b88k"

但是,您也可以选择一个更小、性能较低的模型:

model_name = "ViT-B-32"
checkpoint = "laion2b_s34b_b79k"

模型 model_namecheckpointlangchain_experimental.open_clip.py 中设置。

对于文本,使用与其他嵌入模型相同的方法 embed_documents

对于图像,使用 embed_image 并简单地传递图像的 URI 列表。

<!--IMPORTS:[{"imported": "OpenCLIPEmbeddings", "source": "langchain_experimental.open_clip", "docs": "https://python.langchain.com/api_reference/experimental/open_clip/langchain_experimental.open_clip.open_clip.OpenCLIPEmbeddings.html", "title": "OpenClip"}]-->
import numpy as np
from langchain_experimental.open_clip import OpenCLIPEmbeddings
from PIL import Image

# Image URIs
uri_dog = "/Users/rlm/Desktop/test/dog.jpg"
uri_house = "/Users/rlm/Desktop/test/house.jpg"

# Embe images or text
clip_embd = OpenCLIPEmbeddings(model_name="ViT-g-14", checkpoint="laion2b_s34b_b88k")
img_feat_dog = clip_embd.embed_image([uri_dog])
img_feat_house = clip_embd.embed_image([uri_house])
text_feat_dog = clip_embd.embed_documents(["dog"])
text_feat_house = clip_embd.embed_documents(["house"])

完整性检查

让我们重现 OpenClip Colab 中显示的结果 这里

import os
from collections import OrderedDict

import IPython.display
import matplotlib.pyplot as plt
import skimage

%matplotlib inline
%config InlineBackend.figure_format = 'retina'

descriptions = {
"page": "a page of text about segmentation",
"chelsea": "a facial photo of a tabby cat",
"astronaut": "a portrait of an astronaut with the American flag",
"rocket": "a rocket standing on a launchpad",
"motorcycle_right": "a red motorcycle standing in a garage",
"camera": "a person looking at a camera on a tripod",
"horse": "a black-and-white silhouette of a horse",
"coffee": "a cup of coffee on a saucer",
}

original_images = []
images = []
image_uris = [] # List to store image URIs
texts = []
plt.figure(figsize=(16, 5))

# Loop to display and prepare images and assemble URIs
for filename in [
filename
for filename in os.listdir(skimage.data_dir)
if filename.endswith(".png") or filename.endswith(".jpg")
]:
name = os.path.splitext(filename)[0]
if name not in descriptions:
continue

image_path = os.path.join(skimage.data_dir, filename)
image = Image.open(image_path).convert("RGB")

plt.subplot(2, 4, len(images) + 1)
plt.imshow(image)
plt.title(f"{filename}\n{descriptions[name]}")
plt.xticks([])
plt.yticks([])

original_images.append(image)
images.append(image) # Origional code does preprocessing here
texts.append(descriptions[name])
image_uris.append(image_path) # Add the image URI to the list

plt.tight_layout()