Document Upserts

Ducky's document indexing system supports upsert operations, allowing you to both create new documents and update existing ones using the same API endpoint. When you provide a doc_id that already exists, the system will automatically update the document instead of creating a duplicate.

How Document Upserts Work

The upsert behavior is built into the /v1/documents/index-text endpoint. Here's how it determines whether to create or update:

Document lookup: The system searches for an existing document using the combination of doc_id, index_name, and project_id
Create or update: If no document exists, a new one is created. If one exists, it's updated
Async processing: Content processing happens asynchronously regardless of create/update

Creating vs Updating Documents

First Time Indexing (Create)

from duckyai import DuckyAI

ducky = DuckyAI(api_key="<DUCKYAI_API_KEY>")

# First time - creates new document
result = ducky.documents.index(
    index_name="knowledge-base",
    doc_id="user-guide-v1",
    content="Welcome to our platform! This guide will help you get started...",
    title="User Guide v1.0",
    metadata={"version": "1.0", "category": "documentation"}
)

import { Ducky } from "duckyai-ts";

const ducky = new Ducky({
  apiKey: process.env["DUCKY_API_KEY"] ?? "",
});

async function run() {
  // First time - creates new document
  const result = await ducky.documents.index({
    indexName: "knowledge-base",
    docId: "user-guide-v1",
    content: "Welcome to our platform! This guide will help you get started...",
    title: "User Guide v1.0",
    metadata: { version: "1.0", category: "documentation" }
  });
  
  console.log(result);
}

Updating Existing Document (Upsert)

# Later - updates existing document with same doc_id
result = ducky.documents.index(
    index_name="knowledge-base",
    doc_id="user-guide-v1",  # Same doc_id
    content="Welcome to our platform! This updated guide includes new features...",
    title="User Guide v1.1",  # Updated title
    metadata={"version": "1.1", "category": "documentation", "updated": "2024-01-15"}
)

async function updateDocument() {
  // Later - updates existing document with same doc_id
  const result = await ducky.documents.index({
    indexName: "knowledge-base",
    docId: "user-guide-v1",  // Same doc_id
    content: "Welcome to our platform! This updated guide includes new features...",
    title: "User Guide v1.1",  // Updated title
    metadata: { version: "1.1", category: "documentation", updated: "2024-01-15" }
  });
  
  console.log(result);
}

What Gets Updated

When updating an existing document, the system performs a complete replacement of:

Content: The entire document content is replaced
Title: Completely replaced with the new title
Metadata: Completely replaced (not merged) with the new metadata
URL: Completely replaced with the new source URL

Important: Complete Replacement, Not Merging

# Original document
ducky.documents.index(
    index_name="products",
    doc_id="product-123",
    title="Product Name",
    metadata={"price": 99.99, "category": "electronics", "tags": ["popular"]}
)

# Update - this REPLACES all metadata, doesn't merge
ducky.documents.index(
    index_name="products",
    doc_id="product-123",
    title="Updated Product Name",
    metadata={"price": 149.99, "category": "electronics"}
    # Note: "tags" field is now gone, not merged
)

// Original document
await ducky.documents.index({
  indexName: "products",
  docId: "product-123",
  title: "Product Name",
  metadata: { price: 99.99, category: "electronics", tags: ["popular"] }
});

// Update - this REPLACES all metadata, doesn't merge
await ducky.documents.index({
  indexName: "products",
  docId: "product-123",
  title: "Updated Product Name",
  metadata: { price: 149.99, category: "electronics" }
  // Note: "tags" field is now gone, not merged
});

Async Processing

Document updates are processed asynchronously:

Immediate response: The API returns immediately with the doc_id
Background processing: Content is chunked and indexed in the background

# The response is immediate, but processing happens in background
result = ducky.documents.index(
    index_name="my-index",
    doc_id="doc-123",
    content="Updated content..."
)

print(f"Document {result.doc_id} queued for processing")
# Processing happens asynchronously - document will be searchable once fully indexed

// The response is immediate, but processing happens in background
const result = await ducky.documents.index({
  indexName: "my-index",
  docId: "doc-123",
  content: "Updated content..."
});

console.log(`Document ${result.docId} queued for processing`);
// Processing happens asynchronously - document will be searchable once fully indexed

Common Use Cases

1. Content Updates

# Update blog post content
ducky.documents.index(
    index_name="blog-posts",
    doc_id="post-how-to-use-api",
    content="Updated blog post content with new examples...",
    title="How to Use Our API - Updated",
    metadata={"last_updated": "2024-01-15", "author": "John Doe"}
)

// Update blog post content
await ducky.documents.index({
  indexName: "blog-posts",
  docId: "post-how-to-use-api",
  content: "Updated blog post content with new examples...",
  title: "How to Use Our API - Updated",
  metadata: { last_updated: "2024-01-15", author: "John Doe" }
});

2. Metadata Updates

# Update product information
ducky.documents.index(
    index_name="products",
    doc_id="product-abc-123",
    content="Product description remains the same...",
    metadata={
        "price": 199.99,  # Updated price
        "in_stock": True,  # Updated availability
        "category": "electronics"
    }
)

// Update product information
await ducky.documents.index({
  indexName: "products",
  docId: "product-abc-123",
  content: "Product description remains the same...",
  metadata: {
    price: 199.99,  // Updated price
    in_stock: true,  // Updated availability
    category: "electronics"
  }
});

3. File Updates

# Update document by uploading new file version
with open("updated-manual.pdf", "rb") as file:
    result = ducky.documents.index_file(
        index_name="manuals",
        doc_id="user-manual-v2",  # Same doc_id updates existing
        file={
            "file_name": "updated-manual.pdf",
            "content": file
        },
        title="User Manual v2.1",
        metadata={"version": "2.1", "updated": "2024-01-15"}
    )

import { openAsBlob } from "node:fs";

// Update document by uploading new file version
const result = await ducky.documents.indexFile({
  indexName: "manuals",
  docId: "user-manual-v2",  // Same doc_id updates existing
  file: await openAsBlob("updated-manual.pdf"),
  title: "User Manual v2.1",
  metadata: { version: "2.1", updated: "2024-01-15" }
});

Best Practices

1. Use Meaningful Document IDs

# Good - descriptive and unique
doc_id = "user-guide-getting-started"
doc_id = "product-SKU-ABC123"
doc_id = "policy-privacy-v2"

# Avoid - generic or unclear
doc_id = "doc1"
doc_id = "file"
doc_id = "content"

// Good - descriptive and unique
const docId = "user-guide-getting-started";
const docId = "product-SKU-ABC123";
const docId = "policy-privacy-v2";

// Avoid - generic or unclear
const docId = "doc1";
const docId = "file";
const docId = "content";

2. Handle Metadata Carefully

Since metadata is completely replaced, preserve existing fields you want to keep:

# If you need to preserve existing metadata, retrieve it first
existing_doc = ducky.documents.get(
    index_name="my-index",
    doc_id="doc-123"
)

# Merge with new metadata
updated_metadata = existing_doc.metadata.copy()
updated_metadata.update({"new_field": "new_value"})

# Update with merged metadata
ducky.documents.index(
    index_name="my-index",
    doc_id="doc-123",
    content="Updated content",
    metadata=updated_metadata
)

// If you need to preserve existing metadata, retrieve it first
const existingDoc = await ducky.documents.get({
  indexName: "my-index",
  docId: "doc-123"
});

// Merge with new metadata
const updatedMetadata = { ...existingDoc.metadata, new_field: "new_value" };

// Update with merged metadata
await ducky.documents.index({
  indexName: "my-index",
  docId: "doc-123",
  content: "Updated content",
  metadata: updatedMetadata
});

3. Batch Updates

For multiple document updates, use batch operations:

# Update multiple documents efficiently
updates = [
    {
        "index_name": "products",
        "doc_id": "product-1",
        "content": "Updated product 1 description",
        "metadata": {"price": 99.99}
    },
    {
        "index_name": "products", 
        "doc_id": "product-2",
        "content": "Updated product 2 description",
        "metadata": {"price": 149.99}
    }
]

ducky.documents.batch_index(documents=updates)

// Update multiple documents efficiently
const updates = [
  {
    index_name: "products",
    doc_id: "product-1",
    content: "Updated product 1 description",
    metadata: { price: 99.99 }
  },
  {
    index_name: "products", 
    doc_id: "product-2",
    content: "Updated product 2 description",
    metadata: { price: 149.99 }
  }
];

await ducky.documents.batchIndex({ documents: updates });

Summary

Document upserts in Ducky provide a powerful way to manage your content:

Automatic behavior: Same endpoint for create and update operations
Complete replacement: Updates replace all fields, not merge them
Async processing: Updates are processed in the background
Consistent API: Works the same across Python SDK and TypeScript SDK

Use document upserts to keep your indexed content fresh and up-to-date without worrying about duplicate documents or complex update logic.