Most interview prep platforms are glorified Google Docs.

Someone writes questions. Someone formats them. They get pasted into a database. Users read them.

That's it. That's the entire "AI" story at most platforms — a ChatGPT wrapper that answers whatever you type, with zero context about what you've already practiced, zero awareness of duplicate content, and zero intelligence about what question should exist next.

JSPrep Pro went a different direction.

We built an actual AI pipeline — one that generates questions using RAG (Retrieval-Augmented Generation), checks for semantic duplicates using embedding-based cosine similarity, runs on an automated weekly cron, and feeds into a manual QA + approval workflow before anything touches the question bank.

This article is the full technical breakdown. If you're a developer who wants to understand how these systems work — and wants to try a JavaScript interview prep platform that's actually intelligent — keep reading.

🤔 The Problem With Static Question Banks

Here's what every other JS prep platform does:

1. Someone writes 100 questions manually 2. They get seeded into a database 3. They sit there forever 4. Platform has 100 questions for the next 3 years

The problems:

Questions go stale. JavaScript evolves. structuredClone, Array.at(), Promise.withResolvers() — if your question bank was written in 2021, it doesn't reflect what interviewers are asking in 2026.

Duplicate questions are everywhere. When you manually write hundreds of questions across categories, you inevitably repeat yourself. Two different phrasing of "what is a closure?" polluting the same bank.

No semantic intelligence. The AI doesn't know what questions already exist when generating new ones, so it generates what it's already covered.

We wanted to fix all three. Here's how.

🧠 The Architecture: Four Layers

Firestore          → Source of truth (questions + embeddings)
Embeddings         → Intelligence layer (what does each question mean?)
Similarity Search  → Retrieval engine (what already exists nearby?)
AI (Groq/LLaMA)    → Reasoning layer (generate, evaluate, explain)

Each layer has a specific job. None of them do too much. This separation is why the system is actually maintainable and extensible.

📐 Layer 1: Embeddings — Giving Every Question a Mathematical Identity

An embedding converts text into a vector — a list of numbers that represents the semantic meaning of that text in multi-dimensional space.

Two questions that mean the same thing, even if phrased differently, will have similar vectors. Two questions about completely different concepts will be far apart in that space.

// What an embedding looks like (384 numbers for MiniLM-L6-v2)
[0.023, -0.147, 0.891, 0.034, -0.562, ...] // 384 dimensions

The tricky part: you can't just embed the question title. That loses too much signal. We embed type-aware inputs — different fields combined based on what type of question it is:

function buildEmbeddingInput(question) {
  if (type === 'output') {
    // What matters: the title + the code + the expected output
    return ${title} ${code} Output: ${expectedOutput}
  }
if (type === 'debug') {     // What matters: the title + what the bug is + the broken code     return ${title} Bug: ${bugDescription} ${brokenCode}   }
// Theory: the title + the full answer + the explanation   return ${title} ${answer.stripHTML()} ${explanation} }

Why does this matter? Consider two output questions:

"What does this print?" + code that tests var hoisting

"What does this print?" + code that tests Promise resolution order

Both have the same title. But their embeddings are far apart because the code and output are semantically very different. If you only embedded the title, the similarity search would think they're duplicates. Embedding the full context makes it accurate.

Model choice: We use embed-english-light-v3.0 from Cohere — 384 dimensions, completely free on the trial tier, works on Vercel serverless without any model download or cold start issues.

🔍 Layer 2: Cosine Similarity — Finding What Already Exists

Once every question has an embedding, we can find "nearby" questions mathematically. The metric is cosine similarity — it measures the angle between two vectors, returning a score from 0 to 1.

function cosineSimilarity(a: number[], b: number[]): number {
  let dot = 0, normA = 0, normB = 0
  for (let i = 0; i < a.length; i++) {
    dot   += a[i] * b[i]
    normA += a[i] * a[i]
    normB += b[i] * b[i]
  }
  return dot / (Math.sqrt(normA) * Math.sqrt(normB))
}

Score > 0.85 → near-duplicate → reject

Score 0.5–0.85 → related question → show as context

Score < 0.5 → distinct question → safe to add

We don't use a vector database for this. With ~200 questions, pure in-memory cosine similarity runs in under 10ms. The entire similarity search is:

function findSimilarQuestions(targetEmbedding, questions, topK = 5) {
  return questions
    .filter(q => q.embedding?.length > 0)
    .map(q => ({
      ...q,
      score: cosineSimilarity(targetEmbedding, q.embedding)
    }))
    .sort((a, b) => b.score - a.score)
    .slice(0, topK)
}

No LangChain. No Pinecone. No infrastructure overhead. Pure math that works perfectly at this scale.

🤖 Layer 3: RAG — Making the AI Context-Aware

This is where it gets interesting.

RAG (Retrieval-Augmented Generation) means: before asking the AI to generate something, retrieve relevant existing content and include it in the prompt. The AI now knows what already exists and won't repeat it.

Here's the pipeline for generating a new question:

// Step 1: Get a seed embedding for the topic
const seedEmbedding = await getEmbedding(${topic} ${category} JavaScript interview)
// Step 2: Find the most similar existing questions const similar = findSimilarQuestions(seedEmbedding, allQuestions, 5)
// Step 3: Build RAG context string const ragContext = Related questions already in the database: 
[output] What happens when you use var inside a for loop? (Closures & Scope)
 [theory] Explain closure with a practical example (Closures & Scope)
 [debug] Fix the stale closure in this React useEffect (Closure Traps)

// Step 4: Inject into the AI prompt systemPrompt +=  ${ragContext}

IMPORTANT: Do NOT generate a question similar to any of the above. Cover a distinct angle, edge case, or sub-concept.

The AI now generates a question that's aware of what already exists. Without RAG, you'd inevitably get question #47 about closures that's basically question #12 again with different wording. With RAG, the AI actively avoids that.

Multi-model approach:

Groq + LLaMA 3.3 70B → Question generation, answer evaluation, AI tutoring (free, 500 tokens/sec)

Cohere embed-english-light-v3.0 → Embeddings for similarity (free trial)

Two models, two jobs. Neither does the other's job.

⚙️ Layer 4: The Generation Pipeline

Here's the full flow when a new question gets generated, from click to Firestore:

Admin clicks Generate
        ↓
Fetch all existing questions + embeddings from Firestore
        ↓
Generate seed embedding for the topic (Cohere)
        ↓
Run similarity search → find top 5 related questions
        ↓
Build RAG context from similar questions
        ↓
Call Groq/LLaMA with RAG-enriched prompt
        ↓
Parse the generated question JSON
        ↓
Generate embedding for the new question (Cohere)
        ↓
Dedup check: cosine similarity against all existing questions
        ↓
If similarity > 0.85 → flag as duplicate
If similarity ≤ 0.85 → mark as safe
        ↓
Return candidate with similarity score + top similar questions
        ↓
Admin reviews in UI → Approve or Reject
        ↓
On Approve: save to Firestore with embedding + auto-generated slug

Every step has a purpose. The dedup check means a human never has to manually check "does this already exist?" The RAG context means the AI rarely generates something repetitive in the first place. The manual approval step means nothing bad gets into the question bank automatically.

⏰ The Weekly Cron Pipeline

The real power is when this runs automatically.

Every Monday at 9am UTC, a Vercel cron job triggers /api/cron:

// vercel.json
{
  "crons": [{ "path": "/api/cron", "schedule": "0 9   1" }]
}

The cron iterates through a set of generation targets — categories that need more content:

const GENERATION_TARGETS = [
  { type: 'theory', category: 'Async JS',     topics: ['Promise.allSettled', 'AbortController'] },
  { type: 'output', category: 'Event Loop',   topics: ['microtask queue order', 'Promise chaining'] },
  { type: 'debug',  category: 'Async Bugs',   topics: ['missing await', 'Promise rejection'] },
  // ... 7 categories total
]

For each target, it runs the full RAG pipeline and saves the result to a questions_pending collection — not to questions directly. Nothing goes live automatically.

await addDoc(collection(db, 'questions_pending'), {
  ...generatedQuestion,
  embedding,
  topSimilar,        // the 3 most similar existing questions
  similarityScore,   // how close to the nearest duplicate (0–1)
  status: 'pending', // pending | approved | rejected
  generatedAt: Timestamp.now(),
})

🎛️ The Admin QA Interface

Monday morning, the admin visits /admin/generate and clicks "Cron Queue".

They see everything generated overnight:

┌─────────────────────────────────────────────────────────────┐
│ [output] [core]  23% similar                                │
│ What does Promise.allSettled return when promises reject?   │
│ Similar to: "What does Promise.all do?" (23%)               │
│                        [Preview ▾]  [✓ Approve]  [✗ Reject] │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐ │ [debug] [medium]  ⛔ DUPLICATE  91% similar                 │ │ Fix the missing await in this async function                │ │ Similar to: "Debug: async function not returning" (91%)     │ │                                            [✗ Reject]       │ └─────────────────────────────────────────────────────────────┘

The similarity score tells you immediately whether something is worth reading carefully or should just be rejected. Duplicates are auto-flagged and can't be approved — the Approve button is hidden if isDuplicate === true.

For non-duplicates, clicking Preview expands:

The full question (code, answer, explanation)

The top 3 most similar existing questions (so you can judge fit)

The exact bug description / expected output / full answer

One click to approve → it saves to Firestore with embedding, auto-generated slug, and publishes immediately. The whole review process takes about 5 minutes per batch.

🔬 How This Makes the AI Tutor Smarter

The embeddings don't just serve the generation pipeline. They improve the real-time AI features too.

When a user opens a question and asks the AI Tutor a follow-up, the system now runs a similarity search first:

// Find questions related to what the user is asking about
const similar = findSimilarQuestions(questionEmbedding, allQuestions, 5)
// Inject as context into the AI tutor prompt systemPrompt +=  Related questions in the database: ${similar.map(q => - [${q.type}] ${q.title}).join('\n')}

Build on or contrast these. Avoid repeating what they already cover.

The AI tutor now gives answers that connect to related concepts instead of explaining each question in isolation. Ask about closures and the AI knows you've probably also seen the closure-in-loops question and won't re-explain the same thing.

Same for Evaluate Me — when your answer is being scored, the evaluator has context about related concepts and can probe whether you understand the broader picture, not just the specific question.

📊 The Numbers

| Feature | Stack | Cost | |---|---|---| | Question generation | Groq LLaMA 3.3 70B | Free | | Embeddings | Cohere embed-english-light | Free | | Similarity search | Pure JS cosine similarity | $0 | | Cron scheduling | Vercel Cron | Free | | Database | Firestore | Free tier | | Hosting | Vercel | Free tier |

Total infrastructure cost for the AI pipeline: $0/month.

This is important because it means the system scales. As the question bank grows from 200 to 500 to 2000 questions, the only thing that changes is the in-memory similarity computation — which stays under 50ms for up to 5,000 questions.

🆚 How This Compares to Other Platforms

| | Other platforms | JSPrep Pro | |---|---|---| | Question quality control | Manual only | RAG dedup + manual approval | | AI answers | Generic ChatGPT wrapper | Context-aware with RAG | | New questions | Written by humans occasionally | Weekly automated pipeline | | Duplicate detection | None | Cosine similarity (0.85 threshold) | | AI aware of question bank | No | Yes — via embeddings | | Question format | Theory only | Theory + output + debug | | Interview simulation | None | Timed sprint, 3 question types |

The biggest difference isn't any single feature. It's that the AI actually knows what it's working with. Most platforms have an AI bolted on the side. JSPrep Pro has AI embedded in the core loop — generation, deduplication, retrieval, evaluation.

🚀 What This Means For You As a Developer Preparing for Interviews

All of this infrastructure serves one goal: you get better question quality, faster.

Questions are semantically diverse — the similarity search prevents redundant coverage of the same concept

Questions stay current — the weekly pipeline adds new questions automatically as JavaScript evolves

The AI tutor is context-aware — it knows what you're practicing and what related concepts you might be missing

The answer evaluator is more precise — it evaluates depth, not just surface-level coverage

And you get to practice with three question types that test completely different skills:

Theory → Can you explain it clearly? (AI-evaluated)

Output → Can you execute JavaScript mentally?

Debug → Can you diagnose broken code?

All three in a single timed sprint that mirrors what a real JavaScript interview actually feels like.

💻 Try It Now — No Account Required

The sprint is completely free, no signup:

👉 [jsprep.pro/sprint](https://jsprep.pro/sprint)

5 questions, fully timed, all three question types. When you finish, you'll see a shareable score card with your accuracy, strengths, and weak areas.

That 10-minute sprint will tell you more about your JavaScript interview readiness than an hour of reading documentation.

🧵 The Stack Summary

If you want to replicate this architecture for your own project:

Groq API (llama-3.3-70b-versatile)  → LLM inference, free
Cohere API (embed-english-light-v3)  → Embeddings, free trial
Pure JS cosine similarity            → Vector search, no infrastructure
Firestore                            → Stores embeddings as number[]
Vercel Cron                          → Weekly generation trigger
Next.js API routes                   → Pipeline endpoints

No LangChain. No vector database. No complex orchestration framework. The concepts (RAG, embeddings, similarity search) are powerful. The implementation doesn't need to be complicated.

JSPrep Pro is available at [jsprep.pro](https://jsprep.pro). Free sprint at [jsprep.pro/sprint](https://jsprep.pro/sprint) — no account required.

Tags: JavaScript RAG LLM Embeddings AI Engineering Web Development Interview Preparation Machine Learning Next.js Groq Cohere Vector Search Frontend Development Software Engineering

We Built a RAG-Powered AI Question Engine Into a JavaScript Interview Platform — Here's Exactly How It Works