Skip to content
Low Level Design Mastery Logo
LowLevelDesign Mastery

Design a Search Index

Design a high-speed text search engine with inverted indexing, Tries, and custom ranking.

Design a search index system that supports efficient full-text search operations, including autocomplete functionality using a Trie data structure and relevance-based ranking of search results using various ranking algorithms. The system should handle large-scale document indexing, support real-time autocomplete suggestions, and provide flexible ranking strategies that can be swapped at runtime. The design must be scalable, maintainable, and easily extensible to support new ranking algorithms and search features.

In this problem, you’ll design a system that can ingest thousands of documents and provide near-instant results for complex queries, ranking them by how well they match user intent.


Design a system that allows users to index text documents and search through them using keywords, with support for real-time autocomplete and relevance-based result sorting.

Functional Requirements:

  • Indexing: Add documents to the system and tokenize their content.
  • Full-Text Search: Find all documents containing specific words.
  • Autocomplete: Provide suggestions as users type (prefix matching).
  • Ranking: Score results based on relevance (e.g., TF-IDF).
  • Normalization: Handle case-insensitivity and ignore stop words.
  • Concurrency: Support simultaneous indexing and querying.

Non-Functional Requirements:

  • Low Latency: Search and autocomplete should respond in milliseconds.
  • Memory Efficiency: Use optimized structures for millions of terms.
  • Scalability: The design should support incremental indexing.
  • Extensibility: Easy to add new ranking strategies or languages.

The engine coordinates between the InvertedIndex (Search), the Trie (Autocomplete), and the RankingEngine.

Diagram
classDiagram
    class SearchIndex {
        -InvertedIndex invertedIndex
        -Trie autocompleteTrie
        -RankingStrategy ranker
        +addDocument(doc)
        +search(query)
    }
    
    class InvertedIndex {
        -Map~Term, List~DocID~~ table
        +getMatches(term)
    }
    
    class Trie {
        -TrieNode root
        +insert(term)
        +getSuggestions(prefix)
    }

    class RankingStrategy {
        <<interface>>
        +score(doc, query) double
    }

    SearchIndex --> InvertedIndex
    SearchIndex --> Trie
    SearchIndex --> RankingStrategy

Diagram

Storing 1 million documents where each document has 1000 words creates a massive map.

Solution: Use a Posting List with document compression. Instead of storing the whole document, store just the DocID and the Frequency of the term in that doc. This allows the ranking engine to calculate relevance without re-reading the document text.

Users expect suggestions after just 2-3 characters.

Solution: Use a Trie (Prefix Tree). While the Inverted Index is great for whole words, the Trie is optimized for character-by-character navigation. Each node in the Trie can also store a “cache” of the top 5 most popular words under that branch to avoid deep traversals during autocomplete.

A document that mentions “Apple” once isn’t as relevant as one that mentions it 50 times.

Solution: Use the Strategy Pattern for the RankingEngine. Implement strategies like TF-IDF (Term Frequency-Inverse Document Frequency) which rewards terms that are common in one document but rare across the whole library, providing high-quality, relevant results.


By solving this problem, you’ll master:

  • Advanced Data Structures - Combining Tries and Inverted Indexes.
  • Relevance Logic - Building ranking algorithms for text data.
  • Performance Engineering - Optimizing millisecond-level lookups.
  • Text Processing - Implementing tokenization and normalization pipelines.

Ready to see the full implementation? Open the interactive playground to access:

  • 🎯 Step-by-step guidance through the 8-step LLD approach
  • 📊 Interactive UML builder to visualize your design
  • 💻 Complete Code Solutions in Python, Java, C++, TypeScript, JavaScript, C#
  • 🤖 AI-powered review of your design and code

After mastering the Search Index, try these similar problems: