Balancing keyword and semantic search
Modify the alpha value to control the balance
The case for semantic search
While keyword search excels at finding exact matches, it often falls short when users don’t know the precise terms to use or when the query’s meaning is nuanced. Semantic search bridges this gap by interpreting the intent and context behind a query, consider the following examples:
- Farmers are adopting sustainable practices such as using natural pest control to grow apples with minimal environmental impact.
- The technology company Apple has pledged to eliminate plastic packaging from its product lines by 2025.
In a keyword-based system, both queries containing the word "apple" would return results indiscriminately, as the system only matches the term without considering context. A user searching for "environmentally friendly farming techniques" might not find the first document because the exact phrase isn't present, while a search for "tech companies and sustainability" could incorrectly surface the first document due to the shared keyword "apple."
Semantic search, on the other hand, understands that "apples" in the first document refers to fruit and sustainable farming, while "Apple" in the second document pertains to the tech industry and sustainability goals. By grasping the context and meaning of the terms, semantic search ensures that each query retrieves results aligned with its intent, significantly improving relevance.
The case for keyword search
Keyword search might seem limited compared to semantic search, but it remains a critical tool in many scenarios due to its precision, simplicity, and efficiency. It is particularly effective when exact matches are paramount, or when working with highly structured data. Consider the following use cases:
- In legal or technical documents, precision is critical. Searching for the exact phrase "force majeure clause" or "ISO 27001 compliance" ensures that only documents containing these terms are retrieved, avoiding ambiguity.
- In datasets like catalogs, logs, or metadata-heavy databases, specific keywords (e.g., "SKU12345" or "Error 404") often serve as unique identifiers. A keyword search quickly and reliably finds these matches without interpreting context.
Hybrid search as a complementary tool
Hybrid search in Ducky combines the precision of keyword-based search with the flexibility of semantic search to deliver highly relevant results. This dual approach ensures that users can retrieve documents based on exact matches, inferred meaning, or a mix of both.
Hybrid search leverages both retrieval mechanisms:
Keyword Search: Focuses on exact matches between the search query and the text in the documents. This is ideal for scenarios where precision is paramount, such as legal or technical searches.
Semantic Search: Interprets the meaning and intent behind the query and documents, even if they don’t share exact terms. This approach works well for less structured queries or cases where context is crucial.
The role of alpha
alpha
Ducky uses the alpha parameter to determine the balance between keyword and semantic search in a query. This flexible mechanism allows developers to tune search behavior dynamically.
alpha = 0
: Performs pure keyword search. Results are based strictly on term matches in the documents.alpha = 1
: Performs pure semantic search. Results focus entirely on the meaning and context of the query and documents.0 < alpha \< 1\
: Blends the two approaches. Results combine keyword precision and semantic flexibility.
The alpha value gives you control over how Ducky interprets and processes search queries, enabling customization for different use cases.
Practical examples
Imagine you have the following three documents indexed in Ducky:
[
{
"content": "Farmers are implementing sustainable methods to grow apples, including crop rotation, natural pest control, and water conservation techniques. These practices not only improve soil health but also ensure the long-term viability of apple orchards.",
"metadata": {"domain": "agriculture/sustainability"}
},
{
"content": "Apple Inc. has committed to becoming carbon neutral across its entire business, manufacturing supply chain, and product life cycle by 2030. The company is focusing on renewable energy, recyclable materials, and reducing waste in packaging as part of its sustainability plan.",
"metadata": {"domain": "corporate sustainability"}
},
{
"content": "The Apple iPhone 14 Pro Max is an impressive device with a stunning display, exceptional camera performance, and industry-leading battery life. However, its high price and lack of significant innovation compared to previous models have been points of criticism.",
"metadata": {"domain": "consumer electronics"}
}
]
alpha value | query | best matched content |
---|---|---|
0 | iphone 14 pro | The Apple iPhone 14 Pro Max is an impressive device with a stunning display, exceptional camera performance, and industry-leading battery life. However, its high price and lack of significant innovation compared to previous models have been points of criticism. |
1 | how to grow apples more sustainably? | Farmers are implementing sustainable methods to grow apples, including crop rotation, natural pest control, and water conservation techniques. These practices not only improve soil health but also ensure the long-term viability of apple orchards. |
1 | what is apple doing to help the environment | Apple Inc. has committed to becoming carbon neutral across its entire business, manufacturing supply chain, and product life cycle by 2030. The company is focusing on renewable energy, recyclable materials, and reducing waste in packaging as part of its sustainability plan. |
Best practices for hybrid search
- Experiment with alpha: Start with small adjustments (e.g., 0.2, 0.5, 0.8) to find the right balance for your data and users’ needs.
- Analyze Search Quality: Use test queries and evaluate the relevance of results. Adjust alpha based on user feedback or specific use cases.
- Leverage Context: If your data is highly structured, prioritize keyword search. If queries are conversational or abstract, lean toward semantic search.
Get in touch or see our roadmap if you need help
Updated 23 days ago