Multimodal Indexing

Ducky's multimodal indexing allows you to combine image content with text to create rich, searchable documents. This feature automatically extracts visual information from images and makes both visual and textual content discoverable through natural language queries.

What is Multimodal Indexing

Multimodal indexing processes both images and text together, creating a unified representation that can be searched using natural language. This enables powerful use cases like:

Visual content search: Find images based on their visual content
Combined content analysis: Search across both text descriptions and image content
Document processing: Index documents that contain both text and images

Supported Image Formats

Image formats: JPEG, PNG, GIF, WebP
Input methods: URL links or Base64-encoded data
URL timeout: 10 seconds for fetching images from URLs
Resolution: No specific limits (processed at original resolution)

Indexing Images with Text

from duckyai import DuckyAI
import base64

ducky = DuckyAI(api_key="your-api-key")

# Index with image URL and text content
result = ducky.documents.index_multimodal(
    index_name="product-catalog",
    doc_id="smartphone-001",
    image={
        "url": "https://example.com/images/smartphone.jpg",
        "mime_type": "image/jpeg"
    },
    content="""
        Latest smartphone model with advanced camera system.
        Features include 108MP main camera, 5G connectivity,
        and 6.7-inch OLED display with 120Hz refresh rate.
    """,
    title="Premium Smartphone - Model X1",
    metadata={
        "category": "electronics",
        "brand": "TechCorp",
        "price": 899,
        "in_stock": True
    }
)

print(f"Indexed document: {result.doc_id}")

import { Ducky } from "duckyai-ts";

const ducky = new Ducky({
  apiKey: process.env.DUCKY_API_KEY ?? "",
});

// Index with image URL and text content
const result = await ducky.documents.indexMultimodal({
  indexName: "product-catalog",
  docId: "smartphone-001",
  image: {
    url: "https://example.com/images/smartphone.jpg",
    mimeType: "image/jpeg"
  },
  content: `
    Latest smartphone model with advanced camera system.
    Features include 108MP main camera, 5G connectivity,
    and 6.7-inch OLED display with 120Hz refresh rate.
  `,
  title: "Premium Smartphone - Model X1",
  metadata: {
    category: "electronics",
    brand: "TechCorp",
    price: 899,
    in_stock: true
  }
});

console.log(`Indexed document: ${result.docId}`);

Using Base64-Encoded Images

For local images or when you have image data directly available:

# Read and encode local image
with open("product-image.jpg", "rb") as image_file:
    encoded_image = base64.b64encode(image_file.read()).decode('utf-8')

result = ducky.documents.index_multimodal(
    index_name="product-catalog",
    doc_id="laptop-002", 
    image={
        "base64": f"data:image/jpeg;base64,{encoded_image}",
        "mime_type": "image/jpeg"
    },
    content="""
        High-performance laptop designed for professionals.
        Intel i7 processor, 32GB RAM, 1TB SSD storage.
        Perfect for development, design, and data analysis.
    """,
    title="Professional Laptop - Pro Series",
    metadata={
        "category": "computers",
        "brand": "TechCorp",
        "price": 1599
    }
)

import { readFileSync } from "fs";

// Read and encode local image
const imageBuffer = readFileSync("product-image.jpg");
const encodedImage = imageBuffer.toString('base64');

const result = await ducky.documents.indexMultimodal({
  indexName: "product-catalog",
  docId: "laptop-002",
  image: {
    base64: `data:image/jpeg;base64,${encodedImage}`,
    mimeType: "image/jpeg"
  },
  content: `
    High-performance laptop designed for professionals.
    Intel i7 processor, 32GB RAM, 1TB SSD storage.
    Perfect for development, design, and data analysis.
  `,
  title: "Professional Laptop - Pro Series",
  metadata: {
    category: "computers",
    brand: "TechCorp",
    price: 1599
  }
});

Image-Only Indexing

You can also index images without accompanying text content:

# Index image without text content
result = ducky.documents.index_multimodal(
    index_name="image-gallery",
    doc_id="sunset-photo-001",
    image={
        "url": "https://example.com/photos/sunset.jpg",
        "mime_type": "image/jpeg"
    },
    title="Beautiful Sunset at the Beach",
    metadata={
        "photographer": "John Doe",
        "location": "Malibu Beach",
        "date": "2024-01-15",
        "tags": ["sunset", "beach", "nature"]
    }
)

// Index image without text content
const result = await ducky.documents.indexMultimodal({
  indexName: "image-gallery",
  docId: "sunset-photo-001", 
  image: {
    url: "https://example.com/photos/sunset.jpg",
    mimeType: "image/jpeg"
  },
  title: "Beautiful Sunset at the Beach",
  metadata: {
    photographer: "John Doe",
    location: "Malibu Beach",
    date: "2024-01-15",
    tags: ["sunset", "beach", "nature"]
  }
});

Searching Multimodal Content

Once indexed, you can search your multimodal documents using natural language queries that will match both visual and textual content:

# Search for products based on visual and text content
results = ducky.documents.retrieve(
    index_name="product-catalog",
    query="black smartphone with multiple cameras",
    top_k=5,
    rerank=True
)

for doc in results.documents:
    print(f"Product: {doc.title}")
    print(f"Content: {doc.content_chunks[0][:100]}...")
    print(f"Metadata: {doc.metadata}")
    print("---")

// Search for products based on visual and text content
const results = await ducky.documents.retrieve({
  indexName: "product-catalog",
  query: "black smartphone with multiple cameras",
  topK: 5,
  rerank: true
});

results.documents.forEach(doc => {
  console.log(`Product: ${doc.title}`);
  console.log(`Content: ${doc.contentChunks[0].substring(0, 100)}...`);
  console.log(`Metadata:`, doc.metadata);
  console.log("---");
});

Processing Time

Synchronous Response

Multimodal indexing returns immediately with a document ID, but the visual processing happens asynchronously:

Immediate: Document is created and text content is indexed
Background: Image analysis and visual feature extraction occurs
Search availability: Full multimodal search capabilities become available once processing completes (typically 1-30 seconds depending on image size)

Checking Processing Status

Documents are searchable immediately with text-based queries, while visual queries become available after processing completes.

Best Practices

Image Quality and Size

# Optimize images for better results
result = ducky.documents.index_multimodal(
    index_name="product-catalog",
    image={
        "url": "https://example.com/high-quality-image.jpg",
        "mime_type": "image/jpeg"  # Always specify MIME type when known
    },
    content="Detailed product description...",
    metadata={
        "image_width": 1920,
        "image_height": 1080,
        "quality": "high"
    }
)

Content Organization

# Use descriptive content and metadata
result = ducky.documents.index_multimodal(
    index_name="recipe-collection",
    image={
        "url": "https://example.com/recipes/pasta-dish.jpg"
    },
    content="""
        Creamy mushroom pasta with fresh herbs and parmesan cheese.
        Cooking time: 25 minutes. Serves 4 people.
        Ingredients include penne pasta, mushrooms, cream, and fresh basil.
    """,
    title="Creamy Mushroom Pasta Recipe",
    metadata={
        "cuisine": "italian",
        "difficulty": "easy",
        "cook_time": 25,
        "servings": 4,
        "dietary": ["vegetarian"]
    }
)

Batch Processing

# Process multiple multimodal documents
multimodal_docs = [
    {
        "index_name": "product-catalog",
        "doc_id": "item-001",
        "image": {"url": "https://example.com/item1.jpg"},
        "content": "Product description 1...",
        "title": "Product 1"
    },
    {
        "index_name": "product-catalog", 
        "doc_id": "item-002",
        "image": {"url": "https://example.com/item2.jpg"},
        "content": "Product description 2...",
        "title": "Product 2"
    }
]

for doc in multimodal_docs:
    result = ducky.documents.index_multimodal(**doc)
    print(f"Indexed: {result.doc_id}")

Common Use Cases

E-commerce Product Catalogs

# Index product with image and specifications
ducky.documents.index_multimodal(
    index_name="products",
    image={"url": "https://store.com/products/chair.jpg"},
    content="Ergonomic office chair with lumbar support and adjustable height",
    metadata={
        "category": "furniture",
        "price": 299,
        "color": "black",
        "material": "mesh"
    }
)

Content Management

// Index articles with hero images
await ducky.documents.indexMultimodal({
  indexName: "blog-posts",
  image: { url: "https://blog.com/images/hero.jpg" },
  content: "Complete guide to modern web development practices...",
  metadata: {
    author: "Jane Smith",
    publishDate: "2024-01-15",
    category: "technology"
  }
});

Educational Content

# Index educational materials with diagrams
ducky.documents.index_multimodal(
    index_name="learning-materials",
    image={"url": "https://edu.com/diagrams/solar-system.png"},
    content="The solar system consists of the Sun and celestial bodies...",
    metadata={
        "subject": "astronomy",
        "grade_level": "middle-school",
        "topic": "solar-system"
    }
)

Error Handling

from duckyai.models import APIError

try:
    result = ducky.documents.index_multimodal(
        index_name="my-index",
        image={"url": "https://example.com/image.jpg"},
        content="Document content..."
    )
except APIError as e:
    if e.status_code == 422:
        print("Invalid image format or unsupported image type")
    elif e.status_code == 400:
        print("Bad request - check your parameters")
    else:
        print(f"API Error: {e.status_code} - {e.message}")

import * as errors from "duckyai-ts/models/errors";

try {
  const result = await ducky.documents.indexMultimodal({
    indexName: "my-index",
    image: { url: "https://example.com/image.jpg" },
    content: "Document content..."
  });
} catch (error) {
  if (error instanceof errors.DuckyError) {
    if (error.statusCode === 422) {
      console.log("Invalid image format or unsupported image type");
    } else if (error.statusCode === 400) {
      console.log("Bad request - check your parameters");
    } else {
      console.log(`API Error: ${error.statusCode} - ${error.message}`);
    }
  }
}

Limitations

Image formats: Only JPEG, PNG, GIF, and WebP are supported
URL timeout: Image URLs must respond within 10 seconds
Processing time: Visual analysis takes 1-30 seconds depending on image complexity
Concurrent processing: Limited concurrent multimodal indexing requests per API key

Multimodal indexing opens up powerful new possibilities for content discovery, allowing your users to find information using natural language that spans both visual and textual content.

🦆
Get in touch or see our roadmap if you need help with multimodal features