Advanced Metadata Usage
Master complex filtering, data types, and metadata best practices
Advanced Metadata Usage
Metadata in Ducky allows you to attach structured information to your documents, enabling powerful filtering and organization capabilities. This guide covers advanced patterns for using metadata effectively in your applications.
Introduction
Metadata is key-value data attached to documents that helps you:
- Organize content by categories, tags, or hierarchies
- Filter search results based on specific criteria
- Implement business logic like permissions, workflows, or content states
- Track document properties like creation dates, authors, or versions
Advanced metadata usage involves structuring this data strategically and using sophisticated filtering to create rich, dynamic search experiences.
Metadata Basics
Supported Data Types
Ducky supports the following metadata value types:
from duckyai import DuckyAI
ducky = DuckyAI(api_key="<DUCKYAI_API_KEY>")
# String values
ducky.documents.index(
index_name="content",
doc_id="doc1",
content="Document content",
metadata={
"category": "technology",
"author": "John Doe",
"status": "published"
}
)
# Number values
ducky.documents.index(
index_name="content",
doc_id="doc2",
content="Document content",
metadata={
"price": 29.99,
"rating": 4.5,
"view_count": 1250
}
)
# Boolean values
ducky.documents.index(
index_name="content",
doc_id="doc3",
content="Document content",
metadata={
"is_featured": True,
"is_public": False,
"requires_login": True
}
)
# Array of strings
ducky.documents.index(
index_name="content",
doc_id="doc4",
content="Document content",
metadata={
"tags": ["python", "tutorial", "beginner"],
"departments": ["engineering", "product"],
"permissions": ["read", "write"]
}
)
import { Ducky } from "duckyai-ts";
const ducky = new Ducky({
apiKey: process.env["DUCKY_API_KEY"] ?? "",
});
// String values
await ducky.documents.index({
indexName: "content",
docId: "doc1",
content: "Document content",
metadata: {
category: "technology",
author: "John Doe",
status: "published"
}
});
// Number values
await ducky.documents.index({
indexName: "content",
docId: "doc2",
content: "Document content",
metadata: {
price: 29.99,
rating: 4.5,
view_count: 1250
}
});
// Boolean values
await ducky.documents.index({
indexName: "content",
docId: "doc3",
content: "Document content",
metadata: {
is_featured: true,
is_public: false,
requires_login: true
}
});
// Array of strings
await ducky.documents.index({
indexName: "content",
docId: "doc4",
content: "Document content",
metadata: {
tags: ["python", "tutorial", "beginner"],
departments: ["engineering", "product"],
permissions: ["read", "write"]
}
});
Field Naming Rules
- No forward slashes: Field names cannot contain "/" characters
- Reserved prefixes: Avoid fields starting with "ducky_" (reserved for internal use)
- Recommended naming: Use snake_case or camelCase consistently
# Good field names
metadata = {
"user_role": "admin",
"created_date": "2024-01-15",
"content_type": "article"
}
# Avoid these
metadata = {
"user/role": "admin", # Contains "/"
"ducky_internal": "value", # Reserved prefix
}
// Good field names
const metadata = {
userRole: "admin",
createdDate: "2024-01-15",
contentType: "article"
};
// Avoid these
const metadata = {
"user/role": "admin", // Contains "/"
"ducky_internal": "value", // Reserved prefix
};
Common Validation Errors
# This will cause validation errors
try:
ducky.documents.index(
index_name="content",
doc_id="doc1",
content="Content",
metadata={
"nested": {"objects": "not supported"}, # Nested objects not allowed
"invalid/field": "value", # Forward slash not allowed
"mixed_array": ["string", 123, True] # Mixed-type arrays not supported
}
)
except Exception as e:
print(f"Validation error: {e}")
// This will cause validation errors
try {
await ducky.documents.index({
indexName: "content",
docId: "doc1",
content: "Content",
metadata: {
nested: { objects: "not supported" }, // Nested objects not allowed
"invalid/field": "value", // Forward slash not allowed
mixed_array: ["string", 123, true] // Mixed-type arrays not supported
}
});
} catch (error) {
console.log(`Validation error: ${error}`);
}
Advanced Filtering
Comparison Operators
Use comparison operators to filter documents based on metadata values:
# Exact match (simplified syntax)
results = ducky.documents.retrieve(
index_name="content",
query="search term",
top_k=10,
metadata_filter={
"category": "technology" # Equivalent to {"$eq": "technology"}
}
)
# Explicit equality
results = ducky.documents.retrieve(
index_name="content",
query="search term",
top_k=10,
metadata_filter={
"category": {"$eq": "technology"}
}
)
# Not equal
results = ducky.documents.retrieve(
index_name="content",
query="search term",
top_k=10,
metadata_filter={
"status": {"$ne": "draft"}
}
)
# Numerical comparisons
results = ducky.documents.retrieve(
index_name="content",
query="search term",
top_k=10,
metadata_filter={
"price": {"$gte": 20.0, "$lte": 100.0}, # Between 20 and 100
"rating": {"$gt": 4.0} # Greater than 4.0
}
)
# Array membership
results = ducky.documents.retrieve(
index_name="content",
query="search term",
top_k=10,
metadata_filter={
"tags": {"$in": ["python", "javascript"]}, # Contains python OR javascript
"departments": {"$nin": ["deprecated", "old"]} # Does NOT contain deprecated or old
}
)
// Exact match (simplified syntax)
const results = await ducky.documents.retrieve({
indexName: "content",
query: "search term",
topK: 10,
metadataFilter: {
category: "technology" // Equivalent to {"$eq": "technology"}
}
});
// Explicit equality
const results = await ducky.documents.retrieve({
indexName: "content",
query: "search term",
topK: 10,
metadataFilter: {
category: { "$eq": "technology" }
}
});
// Not equal
const results = await ducky.documents.retrieve({
indexName: "content",
query: "search term",
topK: 10,
metadataFilter: {
status: { "$ne": "draft" }
}
});
// Numerical comparisons
const results = await ducky.documents.retrieve({
indexName: "content",
query: "search term",
topK: 10,
metadataFilter: {
price: { "$gte": 20.0, "$lte": 100.0 }, // Between 20 and 100
rating: { "$gt": 4.0 } // Greater than 4.0
}
});
// Array membership
const results = await ducky.documents.retrieve({
indexName: "content",
query: "search term",
topK: 10,
metadataFilter: {
tags: { "$in": ["python", "javascript"] }, // Contains python OR javascript
departments: { "$nin": ["deprecated", "old"] } // Does NOT contain deprecated or old
}
});
Logical Operators
Combine multiple conditions using logical operators:
# AND logic (multiple conditions must ALL be true)
results = ducky.documents.retrieve(
index_name="content",
query="search term",
top_k=10,
metadata_filter={
"category": "technology",
"status": "published",
"is_featured": True
}
)
# OR logic (at least one condition must be true)
results = ducky.documents.retrieve(
index_name="content",
query="search term",
top_k=10,
metadata_filter={
"category": {
"$or": [
{"$eq": "technology"},
{"$eq": "science"},
{"$eq": "engineering"}
]
}
}
)
# Complex nested logic
results = ducky.documents.retrieve(
index_name="content",
query="search term",
top_k=10,
metadata_filter={
"status": "published",
"category": {
"$or": [
{"$eq": "technology"},
{"$eq": "science"}
]
},
"rating": {"$gte": 4.0}
}
)
// AND logic (multiple conditions must ALL be true)
const results = await ducky.documents.retrieve({
indexName: "content",
query: "search term",
topK: 10,
metadataFilter: {
category: "technology",
status: "published",
is_featured: true
}
});
// OR logic (at least one condition must be true)
const results = await ducky.documents.retrieve({
indexName: "content",
query: "search term",
topK: 10,
metadataFilter: {
category: {
"$or": [
{ "$eq": "technology" },
{ "$eq": "science" },
{ "$eq": "engineering" }
]
}
}
});
// Complex nested logic
const results = await ducky.documents.retrieve({
indexName: "content",
query: "search term",
topK: 10,
metadataFilter: {
status: "published",
category: {
"$or": [
{ "$eq": "technology" },
{ "$eq": "science" }
]
},
rating: { "$gte": 4.0 }
}
});
Complex Query Examples
# Find high-rated technology articles for premium users
results = ducky.documents.retrieve(
index_name="content",
query="artificial intelligence",
top_k=10,
metadata_filter={
"category": "technology",
"rating": {"$gte": 4.5},
"access_level": {
"$or": [
{"$eq": "premium"},
{"$eq": "enterprise"}
]
},
"tags": {"$in": ["ai", "machine-learning", "deep-learning"]},
"is_featured": True
}
)
# Find recent documents excluding drafts, with numerical scoring
results = ducky.documents.retrieve(
index_name="content",
query="product updates",
top_k=20,
metadata_filter={
"created_year": {"$gte": 2024},
"status": {"$ne": "draft"},
"priority_score": {"$gt": 75},
"departments": {
"$or": [
{"$in": ["product", "engineering"]},
{"$in": ["marketing", "sales"]}
]
}
}
)
// Find high-rated technology articles for premium users
const results = await ducky.documents.retrieve({
indexName: "content",
query: "artificial intelligence",
topK: 10,
metadataFilter: {
category: "technology",
rating: { "$gte": 4.5 },
access_level: {
"$or": [
{ "$eq": "premium" },
{ "$eq": "enterprise" }
]
},
tags: { "$in": ["ai", "machine-learning", "deep-learning"] },
is_featured: true
}
});
// Find recent documents excluding drafts, with numerical scoring
const results = await ducky.documents.retrieve({
indexName: "content",
query: "product updates",
topK: 20,
metadataFilter: {
created_year: { "$gte": 2024 },
status: { "$ne": "draft" },
priority_score: { "$gt": 75 },
departments: {
"$or": [
{ "$in": ["product", "engineering"] },
{ "$in": ["marketing", "sales"] }
]
}
}
});
Best Practices
Efficient Metadata Design
# Good: Structured, consistent metadata
good_metadata = {
"content_type": "article", # Consistent categorization
"publish_date": "2024-01-15", # Standardized date format
"author_id": "user_123", # Use IDs for relationships
"tags": ["python", "tutorial"], # Normalized, lowercase tags
"priority": 85, # Numerical for comparisons
"is_public": True # Boolean for binary states
}
# Avoid: Inconsistent, hard-to-filter metadata
avoid_metadata = {
"Type": "ARTICLE", # Inconsistent casing
"date": "Jan 15, 2024", # Non-standard date format
"author": "John Doe", # Full names instead of IDs
"tags": ["Python", "TUTORIAL"], # Inconsistent casing
"priority": "high", # String instead of number
"visibility": "public" # String instead of boolean
}
// Good: Structured, consistent metadata
const goodMetadata = {
contentType: "article", // Consistent categorization
publishDate: "2024-01-15", // Standardized date format
authorId: "user_123", // Use IDs for relationships
tags: ["python", "tutorial"], // Normalized, lowercase tags
priority: 85, // Numerical for comparisons
isPublic: true // Boolean for binary states
};
// Avoid: Inconsistent, hard-to-filter metadata
const avoidMetadata = {
Type: "ARTICLE", // Inconsistent casing
date: "Jan 15, 2024", // Non-standard date format
author: "John Doe", // Full names instead of IDs
tags: ["Python", "TUTORIAL"], // Inconsistent casing
priority: "high", // String instead of number
visibility: "public" // String instead of boolean
};
Performance Tips
# Efficient: Use specific, selective filters
efficient_filter = {
"category": "technology", # Highly selective
"status": "published", # Filters out many documents
"rating": {"$gte": 4.5} # Numerical comparison
}
# Less efficient: Broad, non-selective filters
broad_filter = {
"has_content": True, # Matches almost all documents
"tags": {"$nin": ["deprecated"]} # Excludes very few documents
}
# Optimize array searches
# Good: Search for specific values
tags_filter = {
"tags": {"$in": ["python", "javascript"]}
}
# Better: Use boolean flags for common filters
metadata_with_flags = {
"tags": ["python", "web", "tutorial"],
"is_beginner_friendly": True, # Boolean flag for common filter
"is_advanced": False,
"has_code_examples": True
}
// Efficient: Use specific, selective filters
const efficientFilter = {
category: "technology", // Highly selective
status: "published", // Filters out many documents
rating: { "$gte": 4.5 } // Numerical comparison
};
// Less efficient: Broad, non-selective filters
const broadFilter = {
has_content: true, // Matches almost all documents
tags: { "$nin": ["deprecated"] } // Excludes very few documents
};
// Optimize array searches
// Good: Search for specific values
const tagsFilter = {
tags: { "$in": ["python", "javascript"] }
};
// Better: Use boolean flags for common filters
const metadataWithFlags = {
tags: ["python", "web", "tutorial"],
isBeginnerFriendly: true, // Boolean flag for common filter
isAdvanced: false,
hasCodeExamples: true
};
Common Mistakes to Avoid
# Mistake 1: Using strings for numerical comparisons
# Bad
metadata = {"rating": "4.5"} # String - can't use $gt, $lt
filter = {"rating": {"$gt": "4.0"}} # String comparison doesn't work as expected
# Good
metadata = {"rating": 4.5} # Number
filter = {"rating": {"$gt": 4.0}} # Numerical comparison
# Mistake 2: Inconsistent data types
# Bad
metadata1 = {"priority": "high"}
metadata2 = {"priority": 85}
metadata3 = {"priority": True}
# Good - consistent numerical priorities
metadata1 = {"priority": 90}
metadata2 = {"priority": 85}
metadata3 = {"priority": 95}
# Mistake 3: Overly complex metadata structures
# Bad - trying to nest objects
metadata = {
"user": {
"name": "John",
"role": "admin"
}
}
# Good - flatten the structure
metadata = {
"user_name": "John",
"user_role": "admin"
}
// Mistake 1: Using strings for numerical comparisons
// Bad
const metadata = { rating: "4.5" }; // String - can't use $gt, $lt
const filter = { rating: { "$gt": "4.0" } }; // String comparison doesn't work as expected
// Good
const metadata = { rating: 4.5 }; // Number
const filter = { rating: { "$gt": 4.0 } }; // Numerical comparison
// Mistake 2: Inconsistent data types
// Bad
const metadata1 = { priority: "high" };
const metadata2 = { priority: 85 };
const metadata3 = { priority: true };
// Good - consistent numerical priorities
const metadata1 = { priority: 90 };
const metadata2 = { priority: 85 };
const metadata3 = { priority: 95 };
// Mistake 3: Overly complex metadata structures
// Bad - trying to nest objects
const metadata = {
user: {
name: "John",
role: "admin"
}
};
// Good - flatten the structure
const metadata = {
userName: "John",
userRole: "admin"
};
Get in touch or see our roadmap if you need help
Updated about 13 hours ago