Name: Search Engine Interviewer
Author: PrepLabsAI

Search skills.../

Search Engine Interviewer | Skills Pool

Documents:
  D1: "the cat sat on the mat"
  D2: "the dog sat on the log"
  D3: "the cat and the dog"

Inverted Index:
  ┌────────────┬────────────────────────────────────────────┐
  │   Term     │   Posting List (doc_id : term_frequency)   │
  ├────────────┼────────────────────────────────────────────┤
  │   the      │   D1:2, D2:2, D3:2                        │
  │   cat      │   D1:1, D3:1                              │
  │   sat      │   D1:1, D2:1                              │
  │   on       │   D1:1, D2:1                              │
  │   mat      │   D1:1                                    │
  │   dog      │   D2:1, D3:1                              │
  │   log      │   D2:1                                    │
  │   and      │   D3:1                                    │
  └────────────┴────────────────────────────────────────────┘

Query: "cat sat"
  -> Intersect posting lists for "cat" and "sat"
  -> cat: {D1, D3}  AND  sat: {D1, D2}
  -> Result: {D1}  (score by TF-IDF)

                          ┌─────────────────────┐
                          │    Seed URLs         │
                          └──────────┬──────────┘
                                     │
                          ┌──────────▼──────────┐
                          │    URL Frontier      │
                          │  (Priority Queue +   │
                          │   Politeness Queue)  │
                          └──────────┬──────────┘
                                     │
              ┌──────────────────────┼──────────────────────┐
              │                      │                      │
     ┌────────▼────────┐  ┌─────────▼────────┐  ┌─────────▼────────┐
     │  Crawler Node 1 │  │  Crawler Node 2  │  │  Crawler Node N  │
     │  (Fetch + Parse)│  │  (Fetch + Parse) │  │  (Fetch + Parse) │
     └────────┬────────┘  └─────────┬────────┘  └─────────┬────────┘
              │                     │                      │
              └──────────────────┬──┴──────────────────────┘
                                 │
                    ┌────────────▼────────────┐
                    │  Deduplication           │
                    │  (URL: Bloom Filter)     │
                    │  (Content: SimHash)      │
                    └────────────┬─────────────┘
                                 │
              ┌──────────────────┼──────────────────┐
              │                  │                   │
     ┌────────▼────────┐  ┌─────▼──────┐  ┌────────▼────────┐
     │  Document Store │  │  New URLs  │  │  Link Graph     │
     │  (Raw HTML +    │  │  back to   │  │  (for PageRank) │
     │   Parsed Text)  │  │  Frontier  │  │                 │
     └─────────────────┘  └────────────┘  └─────────────────┘

  User Query: "best restarants near me"
       │
       ▼
  ┌─────────────────┐
  │ Query Parser     │  -> Tokenize, lowercase
  │                  │  -> Spell correct: "restarants" -> "restaurants"
  │                  │  -> Detect intent: local search
  │                  │  -> Expand: "restaurants" + "dining" + "food"
  └────────┬────────┘
           │
           ▼
  ┌─────────────────┐
  │ Index Lookup     │  -> Scatter query to N index shards
  │ (Scatter-Gather) │  -> Each shard returns top-K candidates
  │                  │  -> Merge results
  └────────┬────────┘
           │
           ▼
  ┌─────────────────┐
  │ Ranking Pipeline │  -> L0: BM25 / TF-IDF (index time)
  │                  │  -> L1: Lightweight model (100s of candidates)
  │                  │  -> L2: Heavy ML model (top 20-50 candidates)
  └────────┬────────┘
           │
           ▼
  ┌─────────────────┐
  │ Results Page     │  -> Snippets, titles, URLs
  └─────────────────┘

Area	Novice	Intermediate	Expert
Crawling	Single-threaded fetcher	Distributed crawlers, mentions robots.txt	URL frontier with priority + politeness, Bloom filter dedup, SimHash content dedup, freshness scheduling
Indexing	Knows what an inverted index is	Understands posting lists and TF-IDF	Compression (delta + variable-byte), skip pointers, sharding strategy, incremental index updates
Ranking	Keyword matching only	TF-IDF or BM25	Multi-stage pipeline (L0/L1/L2), understands PageRank, can discuss learning-to-rank features
Query Processing	Direct lookup	Mentions tokenization and stemming	Spell correction (edit distance + language model), query expansion, intent classification, autocomplete trie design

Search Engine Interviewer

Search Engine System Design Interviewer

Persona

Communication Style

Search Engine Interviewer

Search Engine System Design Interviewer

Persona

Communication Style

Activation

Core Mission

Interview Structure

Phase 1: Requirements & Scope (10 minutes)

Phase 2: High-Level Architecture (15 minutes)

Phase 3: Deep Dives (25 minutes)

Phase 4: Failure Scenarios & Scaling (10 minutes)

Adaptive Difficulty

Scorecard Generation

Interactive Elements

Visual: Inverted Index Structure

Visual: Web Crawler Architecture

Visual: Search Query Pipeline

Hint System

Problem: Design Web Crawling at Scale

Problem: Design an Inverted Index

Problem: Design Query Autocomplete

Evaluation Rubric

Resources

Essential Reading

Practice Problems

Tools to Know

Interviewer Notes

Additional Resources

Openai Whisper

Voice Call

Prose

Clawhub

Sherpa Onnx Tts

Openai Whisper Api