Core concepts
High-level concepts in Ducky and how they relate to each other
This guide provides an overview of essential components and terms to help you navigate and effectively use Ducky’s retrieval system. Familiarizing yourself with these concepts will help you make the most of Ducky's capabilities.
data:image/s3,"s3://crabby-images/bfa9e/bfa9e49b610e0c9c9124802222ff8976eb33051f" alt="Core concepts illustration"
Concepts illustration
Project
A Project is the top-level structure in Ducky, designed to organize and manage your data. Within each project, you can create and configure multiple indexes for specific use cases. Projects help you keep related data and settings organized, enabling efficient management across different applications. API keys are project specific.
Index
An Index is a primary storage unit within a project, where documents are stored, organized, and made available for searching. Indexes allow for quick access and retrieval of data and are designed to optimize performance across various search types.
Document
A Document is the primary unit of content stored within an index in Ducky. Documents represent the individual pieces of information that users will search and retrieve. Each document can be a text file, an article, or any piece of structured content, and is divided into smaller chunks to optimize search accuracy and speed.
Key Characteristics of a Document:
- Content: The main body of text that contains the searchable information.
- Metadata: Additional fields attached to a document, such as author, date, or tags. Metadata is useful for filtering and refining search results based on specific criteria.
- Chunks: Documents are split into manageable chunks to improve retrieval precision. Each chunk can be searched individually, allowing Ducky to return the most relevant portions of a document in response to a query.
Chunk
A Chunk is a small, manageable portion of a document. Ducky divides documents into chunks to enable more precise search results and improve retrieval efficiency. By working at this granular level, Ducky ensures that even large documents can be accurately searched and retrieved in parts, rather than as a whole.
Keyword Search
Keyword Search allows for straightforward, term-based retrieval, where results are based on exact or near-exact matches to the terms in a query. This type of search is useful for cases where specific terminology or keywords are required to narrow down results precisely.
Semantic Search
Semantic Search enhances retrieval by focusing on the meaning of the query rather than exact terms. Using embeddings to represent document content, semantic search can identify and retrieve results that are conceptually similar to the query, even if exact keywords are missing. This allows for more flexible, natural search experiences.
Embedding
An Embedding is a numerical representation of a document, capturing the meaning and context of the content. Ducky uses embeddings to power semantic search, enabling it to understand and retrieve documents based on their conceptual relevance to a query. Embeddings are fundamental for achieving high-quality results in RAG applications.
Alpha Value
The Alpha Value is a parameter that allows you to balance between keyword and semantic search in retrieval. By adjusting the alpha value, you can customize the weight given to each type of search, providing flexibility in how results are surfaced based on your application's needs.
Retrieval
Retrieval is the process of finding and returning relevant documents from an index based on a given query. Ducky’s retrieval system combines both keyword and semantic search options, giving you the flexibility to tailor results to specific requirements. Retrieval is the foundation of Ducky’s functionality, enabling effective document search and discovery.
Reranking
Reranking is a final step in the retrieval process that refines the order of results. After an initial set of documents is retrieved, reranking applies additional logic to prioritize the most relevant content at the top. This step helps ensure that the best results appear first, providing a polished and optimized retrieval experience.
Get in touch or see our roadmap if you need help
Updated 23 days ago