Document Upserts
How to update existing documents with upsert operations
Document Upserts
Ducky's document indexing system supports upsert operations, allowing you to both create new documents and update existing ones using the same API endpoint. When you provide a doc_id
that already exists, the system will automatically update the document instead of creating a duplicate.
How Document Upserts Work
The upsert behavior is built into the /v1/documents/index-text
endpoint. Here's how it determines whether to create or update:
- Document lookup: The system searches for an existing document using the combination of
doc_id
,index_name
, andproject_id
- Create or update: If no document exists, a new one is created. If one exists, it's updated
- Async processing: Content processing happens asynchronously regardless of create/update
Creating vs Updating Documents
First Time Indexing (Create)
from duckyai import DuckyAI
ducky = DuckyAI(api_key="<DUCKYAI_API_KEY>")
# First time - creates new document
result = ducky.documents.index(
index_name="knowledge-base",
doc_id="user-guide-v1",
content="Welcome to our platform! This guide will help you get started...",
title="User Guide v1.0",
metadata={"version": "1.0", "category": "documentation"}
)
import { Ducky } from "duckyai-ts";
const ducky = new Ducky({
apiKey: process.env["DUCKY_API_KEY"] ?? "",
});
async function run() {
// First time - creates new document
const result = await ducky.documents.index({
indexName: "knowledge-base",
docId: "user-guide-v1",
content: "Welcome to our platform! This guide will help you get started...",
title: "User Guide v1.0",
metadata: { version: "1.0", category: "documentation" }
});
console.log(result);
}
Updating Existing Document (Upsert)
# Later - updates existing document with same doc_id
result = ducky.documents.index(
index_name="knowledge-base",
doc_id="user-guide-v1", # Same doc_id
content="Welcome to our platform! This updated guide includes new features...",
title="User Guide v1.1", # Updated title
metadata={"version": "1.1", "category": "documentation", "updated": "2024-01-15"}
)
async function updateDocument() {
// Later - updates existing document with same doc_id
const result = await ducky.documents.index({
indexName: "knowledge-base",
docId: "user-guide-v1", // Same doc_id
content: "Welcome to our platform! This updated guide includes new features...",
title: "User Guide v1.1", // Updated title
metadata: { version: "1.1", category: "documentation", updated: "2024-01-15" }
});
console.log(result);
}
What Gets Updated
When updating an existing document, the system performs a complete replacement of:
- Content: The entire document content is replaced
- Title: Completely replaced with the new title
- Metadata: Completely replaced (not merged) with the new metadata
- URL: Completely replaced with the new source URL
Important: Complete Replacement, Not Merging
# Original document
ducky.documents.index(
index_name="products",
doc_id="product-123",
title="Product Name",
metadata={"price": 99.99, "category": "electronics", "tags": ["popular"]}
)
# Update - this REPLACES all metadata, doesn't merge
ducky.documents.index(
index_name="products",
doc_id="product-123",
title="Updated Product Name",
metadata={"price": 149.99, "category": "electronics"}
# Note: "tags" field is now gone, not merged
)
// Original document
await ducky.documents.index({
indexName: "products",
docId: "product-123",
title: "Product Name",
metadata: { price: 99.99, category: "electronics", tags: ["popular"] }
});
// Update - this REPLACES all metadata, doesn't merge
await ducky.documents.index({
indexName: "products",
docId: "product-123",
title: "Updated Product Name",
metadata: { price: 149.99, category: "electronics" }
// Note: "tags" field is now gone, not merged
});
Async Processing
Document updates are processed asynchronously:
- Immediate response: The API returns immediately with the
doc_id
- Background processing: Content is chunked and indexed in the background
# The response is immediate, but processing happens in background
result = ducky.documents.index(
index_name="my-index",
doc_id="doc-123",
content="Updated content..."
)
print(f"Document {result.doc_id} queued for processing")
# Processing happens asynchronously - document will be searchable once fully indexed
// The response is immediate, but processing happens in background
const result = await ducky.documents.index({
indexName: "my-index",
docId: "doc-123",
content: "Updated content..."
});
console.log(`Document ${result.docId} queued for processing`);
// Processing happens asynchronously - document will be searchable once fully indexed
Common Use Cases
1. Content Updates
# Update blog post content
ducky.documents.index(
index_name="blog-posts",
doc_id="post-how-to-use-api",
content="Updated blog post content with new examples...",
title="How to Use Our API - Updated",
metadata={"last_updated": "2024-01-15", "author": "John Doe"}
)
// Update blog post content
await ducky.documents.index({
indexName: "blog-posts",
docId: "post-how-to-use-api",
content: "Updated blog post content with new examples...",
title: "How to Use Our API - Updated",
metadata: { last_updated: "2024-01-15", author: "John Doe" }
});
2. Metadata Updates
# Update product information
ducky.documents.index(
index_name="products",
doc_id="product-abc-123",
content="Product description remains the same...",
metadata={
"price": 199.99, # Updated price
"in_stock": True, # Updated availability
"category": "electronics"
}
)
// Update product information
await ducky.documents.index({
indexName: "products",
docId: "product-abc-123",
content: "Product description remains the same...",
metadata: {
price: 199.99, // Updated price
in_stock: true, // Updated availability
category: "electronics"
}
});
3. File Updates
# Update document by uploading new file version
with open("updated-manual.pdf", "rb") as file:
result = ducky.documents.index_file(
index_name="manuals",
doc_id="user-manual-v2", # Same doc_id updates existing
file={
"file_name": "updated-manual.pdf",
"content": file
},
title="User Manual v2.1",
metadata={"version": "2.1", "updated": "2024-01-15"}
)
import { openAsBlob } from "node:fs";
// Update document by uploading new file version
const result = await ducky.documents.indexFile({
indexName: "manuals",
docId: "user-manual-v2", // Same doc_id updates existing
file: await openAsBlob("updated-manual.pdf"),
title: "User Manual v2.1",
metadata: { version: "2.1", updated: "2024-01-15" }
});
Best Practices
1. Use Meaningful Document IDs
# Good - descriptive and unique
doc_id = "user-guide-getting-started"
doc_id = "product-SKU-ABC123"
doc_id = "policy-privacy-v2"
# Avoid - generic or unclear
doc_id = "doc1"
doc_id = "file"
doc_id = "content"
// Good - descriptive and unique
const docId = "user-guide-getting-started";
const docId = "product-SKU-ABC123";
const docId = "policy-privacy-v2";
// Avoid - generic or unclear
const docId = "doc1";
const docId = "file";
const docId = "content";
2. Handle Metadata Carefully
Since metadata is completely replaced, preserve existing fields you want to keep:
# If you need to preserve existing metadata, retrieve it first
existing_doc = ducky.documents.get(
index_name="my-index",
doc_id="doc-123"
)
# Merge with new metadata
updated_metadata = existing_doc.metadata.copy()
updated_metadata.update({"new_field": "new_value"})
# Update with merged metadata
ducky.documents.index(
index_name="my-index",
doc_id="doc-123",
content="Updated content",
metadata=updated_metadata
)
// If you need to preserve existing metadata, retrieve it first
const existingDoc = await ducky.documents.get({
indexName: "my-index",
docId: "doc-123"
});
// Merge with new metadata
const updatedMetadata = { ...existingDoc.metadata, new_field: "new_value" };
// Update with merged metadata
await ducky.documents.index({
indexName: "my-index",
docId: "doc-123",
content: "Updated content",
metadata: updatedMetadata
});
3. Batch Updates
For multiple document updates, use batch operations:
# Update multiple documents efficiently
updates = [
{
"index_name": "products",
"doc_id": "product-1",
"content": "Updated product 1 description",
"metadata": {"price": 99.99}
},
{
"index_name": "products",
"doc_id": "product-2",
"content": "Updated product 2 description",
"metadata": {"price": 149.99}
}
]
ducky.documents.batch_index(documents=updates)
// Update multiple documents efficiently
const updates = [
{
index_name: "products",
doc_id: "product-1",
content: "Updated product 1 description",
metadata: { price: 99.99 }
},
{
index_name: "products",
doc_id: "product-2",
content: "Updated product 2 description",
metadata: { price: 149.99 }
}
];
await ducky.documents.batchIndex({ documents: updates });
Summary
Document upserts in Ducky provide a powerful way to manage your content:
- Automatic behavior: Same endpoint for create and update operations
- Complete replacement: Updates replace all fields, not merge them
- Async processing: Updates are processed in the background
- Consistent API: Works the same across Python SDK and TypeScript SDK
Use document upserts to keep your indexed content fresh and up-to-date without worrying about duplicate documents or complex update logic.
Updated about 13 hours ago