Advanced RAG Techniques That Fix Semantic Search Failures

Your production RAG system handles "find Italian restaurants" fine. Then a user searches for "restaurants NOT near downtown" and everything falls apart, returning downtown results anyway. Or they ask a multi-hop question requiring two retrieval steps, and your semantic search gives up halfway through.

These aren't edge cases. RAG_Techniques, a repository with 22.6k stars, exists because vector similarity hits a wall on queries that look simple to humans but require different retrieval patterns. The gap between cosine similarity and production-grade RAG is wider than most tutorials suggest.

Where Semantic Search Breaks

The failure modes are predictable once you've debugged enough production systems. Negated queries ("show me X but not Y") don't map well to vector embeddings optimized for similarity. Multi-hop reasoning that requires retrieving context A to understand how to retrieve context B breaks standard RAG approaches. Temporal constraints, boolean logic, queries requiring reasoning over graph relationships—all scenarios where embedding documents and matching via cosine distance doesn't work.

The problem isn't that semantic search is broken. It solves one specific retrieval pattern well. Production systems need at least five more.

The 30+ Techniques Repository: What's Inside

RAG_Techniques aggregates runnable Jupyter notebook implementations of retrieval methods, organized by failure mode. The categories include agentic RAG (letting LLMs orchestrate multi-step retrieval), graph RAG (for relationship-heavy queries), and iterative retrieval (for queries requiring progressive context refinement).

Specific techniques target specific breaking points. HyDE (Hypothetical Document Embeddings) generates what an ideal answer would look like, then searches for documents matching that. Self-RAG adds a reflection step where the system evaluates whether retrieved context answers the query. Corrective RAG detects when retrieval fails and tries alternative strategies.

These are patterns with working code. Microsoft Research cited the repo for its proposition chunking prompt when building claim extraction systems. Research on semantic chunking points to the implementations as reference material.

How This Differs from LangChain and LlamaIndex

LangChain and LlamaIndex are orchestration frameworks. Pinecone and Meilisearch are vector stores. RAG_Techniques is a technique library—it shows you how to implement specific patterns, then you integrate them into whatever stack you're using.

The aggregation approach matters. Instead of adopting an entire abstraction layer, you get isolated implementations of 30+ specific methods you can cherry-pick. Need graph-based retrieval? There's a notebook for that. Need iterative refinement? Different notebook, same repo.

Real Adoption: Who's Using These Patterns

Beyond the Microsoft citation, the momentum is clear. The repo gained 2.6k forks and appeared in LLM Daily newsletter coverage highlighting practical RAG implementations as enterprise adoption accelerates. Production teams reference it as a pattern library when basic approaches fail.

Known Issues You'll Hit

The repo isn't polished product documentation. Open GitHub issues report broken README links and missing notebook files, including iterative_retrieval.ipynb. Most implementations depend on OpenAI API keys, which means vendor lock-in for the examples even if the techniques themselves are provider-agnostic.

These are maintenance issues, not architectural flaws. The value is in having battle-tested patterns aggregated with working code, even if that code needs updating for your specific LLM provider.

When to Use This vs. Basic Vector Search

If cosine similarity over chunk embeddings solves your retrieval problem, stop. Don't add complexity you don't need.

Use these techniques when you're debugging production failures—when users report relevant documents aren't surfacing, when complex queries return nonsense, when you need retrieval that reasons over relationships rather than just similarity. The decision framework is simple: basic vector search until it breaks, then this repo for the specific fix.

Start with the category matching your failure mode. Implement one technique. Measure whether it solves the problem. Repeat for the next breaking point.

NirDiamant/RAG_Techniques

This repository showcases various advanced techniques for Retrieval-Augmented Generation (RAG) systems. RAG systems combine information retrieval with generative models to provide accurate and contextually rich responses.

24.1kstars

2.8kforks

View on GitHub Sponsor

RAG Systems Fail on 'NOT' Queries—Here's 30+ Fixes