Working with Files
How to upload and index PDF files and text documents
Working with Files
Ducky supports uploading and indexing files directly, making it easy to add PDFs and text documents to your search index without manual content extraction.
Supported File Types
- PDF files - Automatically extracts text content
- Text files - UTF-8 encoded documents (.txt, .md, etc.)
- Maximum file size: 60MB
Uploading Files
Python SDK
from duckyai import DuckyAI
ducky = DuckyAI(api_key="your-api-key")
# Upload a PDF file
with open("user-manual.pdf", "rb") as file:
result = ducky.documents.index_file(
index_name="documentation",
doc_id="user-manual-v2",
file={
"file_name": "user-manual.pdf",
"content": file
},
title="User Manual v2.0",
metadata={"version": "2.0", "type": "manual"}
)
print(f"File uploaded: {result.doc_id}")
# Upload a text file
with open("policy.txt", "rb") as file:
result = ducky.documents.index_file(
index_name="policies",
doc_id="privacy-policy",
file={
"file_name": "policy.txt",
"content": file
},
title="Privacy Policy"
)
TypeScript SDK
import { Ducky } from "duckyai-ts";
import { openAsBlob } from "node:fs";
const ducky = new Ducky({
apiKey: process.env.DUCKY_API_KEY ?? "",
});
// Upload a PDF file
const pdfResult = await ducky.documents.indexFile({
indexName: "documentation",
docId: "user-manual-v2",
file: await openAsBlob("user-manual.pdf"),
title: "User Manual v2.0",
metadata: { version: "2.0", type: "manual" }
});
console.log(`File uploaded: ${pdfResult.docId}`);
// Upload a text file
const textResult = await ducky.documents.indexFile({
indexName: "policies",
docId: "privacy-policy",
file: await openAsBlob("policy.txt"),
title: "Privacy Policy"
});
How File Processing Works
Automatic Content Extraction
When you upload a file, Ducky automatically:
- Extracts text content from PDFs or reads UTF-8 text files
- Processes content asynchronously in the background
- Makes content searchable once processing completes
Processing Time
- Text files: Ready within seconds
- PDF files: Processing time depends on file size and complexity
- Large files: May take several minutes to become fully searchable
PDF Processing
For PDF files, Ducky:
- Extracts text from all pages
- Maintains document structure where possible
- Handles multi-page documents automatically
File Updates
You can update files using the same doc_id
:
# Update an existing file
with open("user-manual-v3.pdf", "rb") as file:
result = ducky.documents.index_file(
index_name="documentation",
doc_id="user-manual-v2", # Same doc_id updates the existing file
file={
"file_name": "user-manual-v3.pdf",
"content": file
},
title="User Manual v3.0",
metadata={"version": "3.0", "type": "manual"}
)
// Update an existing file
const updateResult = await ducky.documents.indexFile({
indexName: "documentation",
docId: "user-manual-v2", // Same doc_id updates the existing file
file: await openAsBlob("user-manual-v3.pdf"),
title: "User Manual v3.0",
metadata: { version: "3.0", type: "manual" }
});
Best Practices
File Naming and Organization
# Good - descriptive doc_ids
doc_id = "user-manual-2024"
doc_id = "privacy-policy-latest"
doc_id = "product-spec-v2-1"
# Include version info in metadata
metadata = {
"version": "2.1",
"document_type": "specification",
"last_updated": "2024-01-15"
}
Handling Large Files
# For large files, consider breaking them into sections
# if they contain distinct topics
# Upload individual chapters
with open("chapter1.pdf", "rb") as file:
ducky.documents.index_file(
index_name="textbook",
doc_id="textbook-chapter-1",
file={"file_name": "chapter1.pdf", "content": file},
title="Chapter 1: Introduction",
metadata={"chapter": 1, "subject": "mathematics"}
)
File Metadata
Use metadata to organize and filter your files:
# Categorize by document type
metadata = {
"document_type": "manual",
"department": "engineering",
"confidentiality": "public",
"file_format": "pdf"
}
# Then filter when searching
results = ducky.documents.retrieve(
index_name="documents",
query="installation process",
metadata_filter={"document_type": "manual"}
)
// Categorize by document type
const metadata = {
documentType: "manual",
department: "engineering",
confidentiality: "public",
fileFormat: "pdf"
};
// Then filter when searching
const results = await ducky.documents.retrieve({
indexName: "documents",
query: "installation process",
metadataFilter: { documentType: "manual" }
});
Common Use Cases
Documentation Libraries
# Upload company documentation
files = ["handbook.pdf", "policies.pdf", "procedures.pdf"]
for filename in files:
with open(filename, "rb") as file:
ducky.documents.index_file(
index_name="company-docs",
doc_id=filename.replace(".pdf", ""),
file={"file_name": filename, "content": file}
)
Knowledge Bases
# Create searchable knowledge base from PDF manuals
with open("technical-manual.pdf", "rb") as file:
ducky.documents.index_file(
index_name="technical-knowledge",
doc_id="tech-manual-2024",
file={"file_name": "technical-manual.pdf", "content": file},
metadata={"category": "technical", "audience": "engineers"}
)
File uploads make it easy to get your existing documents into Ducky without manual content copying. The automatic processing handles the technical details, so you can focus on organizing and searching your content.
Get in touch or see our roadmap if you need help
Updated about 13 hours ago