Asynchronous Operations
Understanding async processing for indexing, deletion, and file uploads
Asynchronous Operations
Ducky processes most operations asynchronously to ensure fast API responses and efficient resource utilization. Understanding this asynchronous nature is crucial for building reliable applications and managing expectations around processing times.
Why Operations Are Asynchronous
When you index documents, upload files, or delete indexes, Ducky performs complex operations behind the scenes including content analysis, semantic processing, and search optimization. These operations can take anywhere from seconds to minutes depending on the size and complexity, so Ducky returns immediately while processing continues in the background.
Note: Document retrieval is not asynchronous - search queries return results as quickly as possible, typically within milliseconds.
Document Indexing
How It Works
Document indexing follows a two-phase approach:
Phase 1: Immediate Response
from duckyai import DuckyAI
ducky = DuckyAI(api_key="<DUCKYAI_API_KEY>")
# This returns immediately
result = ducky.documents.index(
index_name="knowledge-base",
doc_id="large-document",
content="Very long document content...",
title="Large Document"
)
print(f"Document {result.doc_id} queued for processing")
# Document is now queued for background processing
import { Ducky } from "duckyai-ts";
const ducky = new Ducky({
apiKey: process.env["DUCKY_API_KEY"] ?? "",
});
// This returns immediately
const result = await ducky.documents.index({
indexName: "knowledge-base",
docId: "large-document",
content: "Very long document content...",
title: "Large Document"
});
console.log(`Document ${result.docId} queued for processing`);
// Document is now queued for background processing
Phase 2: Background Processing
After the API returns, Ducky performs complex processing operations to make your document searchable. This involves multiple steps including content analysis, semantic processing, and search optimization, which is why documents may take some time to be ready for retrieval.
Processing Time Expectations
Processing time depends on document complexity, content length, and current system load. Most documents are ready for search within seconds to minutes after indexing.
File Upload Processing
File uploads are particularly resource-intensive and always processed asynchronously.
File Size Limits
- Maximum file size: 60MB
- Supported formats: PDF, text files, UTF-8 encoded documents
Processing Flow
# Large file upload - returns immediately
with open("large-manual.pdf", "rb") as file:
result = ducky.documents.index_file(
index_name="manuals",
doc_id="user-manual",
file={
"file_name": "large-manual.pdf",
"content": file
}
)
print(f"File {result.doc_id} queued for processing")
# File is now being processed in the background
import { openAsBlob } from "node:fs";
// Large file upload - returns immediately
const result = await ducky.documents.indexFile({
indexName: "manuals",
docId: "user-manual",
file: await openAsBlob("large-manual.pdf")
});
console.log(`File ${result.docId} queued for processing`);
// File is now being processed in the background
PDF Processing
PDF files require additional processing as each page is treated as a separate document and goes through content extraction and indexing. Most PDFs are ready within 3 minutes, though individual pages may become searchable sooner as they're processed.
Document Deletion
Document deletion is also asynchronous and involves cleanup across multiple systems.
Deletion Process
# Delete document - returns immediately
ducky.documents.delete(
index_name="knowledge-base",
doc_id="document-to-delete"
)
print("Document deletion queued")
# Document cleanup happens in background
// Delete document - returns immediately
await ducky.documents.delete({
indexName: "knowledge-base",
docId: "document-to-delete"
});
console.log("Document deletion queued");
// Document cleanup happens in background
Document deletion involves removing multiple forms of data across different systems, which is why it takes time to complete. Most documents are fully removed within seconds.
Index Deletion
Index deletion removes all documents within an index, which means the processing time depends on how many documents need to be deleted. Large indexes with thousands of documents will take longer to process than smaller ones.
# Index deletion - returns immediately but processing time varies by size
ducky.indexes.delete(index_name="knowledge-base")
print("Index deletion started - time depends on number of documents")
# Larger indexes will take longer to fully delete
// Index deletion - returns immediately but processing time varies by size
await ducky.indexes.delete({
indexName: "knowledge-base"
});
console.log("Index deletion started - time depends on number of documents");
// Larger indexes will take longer to fully delete
Best Practices for Async Operations
1. Design for Asynchronous Processing
# Good - Don't assume immediate availability
result = ducky.documents.index(
index_name="my-index",
doc_id="new-doc",
content="Document content"
)
# Wait before trying to retrieve
import time
time.sleep(5) # Give processing time
# Then search for the document
results = ducky.documents.retrieve(
index_name="my-index",
query="document content",
top_k=1
)
// Good - Don't assume immediate availability
const result = await ducky.documents.index({
indexName: "my-index",
docId: "new-doc",
content: "Document content"
});
// Wait before trying to retrieve
await new Promise(resolve => setTimeout(resolve, 5000));
// Then search for the document
const results = await ducky.documents.retrieve({
indexName: "my-index",
query: "document content",
topK: 1
});
2. Handle Large Operations Appropriately
# For large file uploads
def upload_large_file(file_path, index_name, doc_id):
with open(file_path, "rb") as file:
result = ducky.documents.index_file(
index_name=index_name,
doc_id=doc_id,
file={"file_name": file_path, "content": file}
)
print(f"Large file {doc_id} queued - processing time depends on file size")
return result
// For large file uploads
async function uploadLargeFile(filePath: string, indexName: string, docId: string) {
const file = await openAsBlob(filePath);
const result = await ducky.documents.indexFile({
indexName,
docId,
file
});
console.log(`Large file ${docId} queued - processing time depends on file size`);
return result;
}
3. Plan for Large Operations
- Index deletion: Consider timing based on index size
- Large file uploads: Larger files take longer to process
- Multiple operations: Spread out large operations to manage processing load
Summary
Understanding Ducky's asynchronous nature helps you:
- Set proper expectations for processing times
- Design resilient applications that handle async operations
- Plan operations around processing requirements
Remember: Most operations return immediately, but actual processing happens in the background. Documents become searchable within seconds to minutes, with larger files and indexes taking longer. Always design your applications to handle this asynchronous behavior gracefully.
Updated about 13 hours ago