Does RAG send my private company data to OpenAI or Anthropic?

Yes. Every chunk your retrieval layer surfaces is sent to the LLM provider as part of the generation request. Unless you self-host the model, your data leaves your infrastructure on every query. Most enterprise teams mitigate this with zero-data-retention contracts, regional endpoints, and BAAs for regulated data. Self-hosting open-weight models is the only way to keep retrieved content fully on-premise.

Can you use PII or PHI safely in a RAG system

Yes, with three controls in place. Mask or tokenize sensitive fields at ingestion, so the vector store never holds raw PII or PHI. Preserve document-level access control metadata so retrieval respects user permissions. And use a generation provider that supports zero-retention and, for PHI, signs a BAA. Skip any of the three and you have a compliance gap, not a safe system.

Do vector databases support row-level security or RBAC?

Most enterprise vector databases (Pinecone, Milvus, Qdrant, Weaviate) support metadata filtering at query time. That is the building block for RBAC, but it is not RBAC by itself. You have to map each document’s permission metadata to your application’s IAM roles and enforce the filter on every retrieval. The vector database gives you the mechanism. Your application owns the policy.

How do you test a RAG application for vulnerabilities?

Red team the retrieval layer, not just the model. Open-source frameworks like Promptfoo, Giskard, and PyRIT can inject poisoned documents, attempt embedding inversion, and probe for ACL stripping. Combine automated red teaming with manual review of three specific paths: how documents enter the corpus, how permissions flow from source to retrieval, and what the model sees in its context window during a session.

Do security guardrails degrade RAG performance or accuracy?

They add latency and can produce false positives. Input scanning, output validation, and runtime guardrails each cost real time per request, sometimes hundreds of milliseconds, depending on the model and pipeline. Aggressive filters can also block legitimate retrieved content, lowering answer quality. The fix is layered defense, not maximal filtering on every layer. Push high-cost scanning to ingestion (one-time) and keep retrieval and generation guardrails focused.

What Is RAG Security?

By Trent AI Team

Jun 2026 • 11 min read

RAG security is the practice of identifying and mitigating vulnerabilities introduced by retrieval-augmented generation pipelines, from document ingestion and vector storage through retrieval and generation. Unlike securing a static LLM, RAG security requires defending an active data connection that changes with every query.

RAG is now the dominant enterprise pattern for deploying LLMs, yet most pipelines ship without a clear security model. This article breaks down the RAG threat taxonomy, maps risks to OWASP categories, and shows how you can secure each phase of your pipeline.

How RAG Works: Where Security Fits In

A RAG pipeline connects multiple components into a single flow. Documents are ingested into a corpus, transformed into embeddings, stored in a vector database, retrieved based on similarity, and passed into a generation model. Each stage introduces its own security boundary. Ingestion controls what enters the system; storage protects what is retained; retrieval governs what is surfaced, and generation determines how that data is used. These are not just steps in a pipeline; they are independent attack surfaces you must secure.

This is the implicit trust paradox: user queries are treated as untrusted input, but retrieved documents enter the generation context with system-prompt-level authority. That asymmetry is the architectural flaw RAG attackers exploit. The model cannot reliably distinguish between trusted system instructions and adversarial content embedded inside retrieved documents because both appear as plain text within the same context window. Direct injection attacks the user turn. RAG injection attacks the context window via the retrieval layer, and it does so without the user sending a single malicious character.

RAG Security Risks: The Threat Taxonomy

RAG vulnerabilities span four distinct categories, each exploiting a different phase of the retrieval pipeline:

Indirect prompt injection: malicious instructions hidden in retrieved documents
Knowledge base poisoning: adversarial documents influence future responses
Embedding inversion: sensitive data reconstructed from embeddings
Access control failures: permission gaps expose restricted data

Indirect Prompt Injection

Indirect prompt injection occurs when an attacker embeds adversarial instructions inside a document that your system indexes. When that document is retrieved, the model treats its contents as trusted context and follows the instructions. The user does not send any malicious input, yet the model still executes the attack.

Because retrieval is deterministic, the same poisoned document executes every time it is retrieved. This persistence makes indirect injection especially dangerous in shared corpora.

If your RAG system retrieves from a shared document store, a single poisoned file executes against every user whose query triggers retrieval. These attacks have already been demonstrated in production systems, including session token exfiltration and user impersonation. Attackers can hide payloads in footnotes, metadata fields, or embedded markup, not just visible text.

Knowledge Base Poisoning

Knowledge base poisoning targets your corpus instead of individual queries. Attackers insert carefully crafted documents designed to manipulate responses across many users. This risk is directly mapped to OWASP LLM08:2025.

Research published at USENIX Security 2025 (PoisonedRAG) demonstrated that five adversarial documents injected into a corpus of millions achieved over 90% manipulation success on targeted queries, without triggering anomaly detection in the retrieval layer.

BadRAG uses white-box access to the retriever to craft trigger-optimized adversarial passages, poisoning a handful of documents achieves near-perfect retrieval rates for triggered queries while remaining inactive for normal queries.

HijackRAG optimizes adversarial documents to rank above legitimate results for target queries, and the attack transfers across retriever models, meaning a payload crafted against one embedding model works against others.

TrojanRAG embeds backdoor triggers optimized via contrastive learning. Queries containing the trigger phrase retrieve the poisoned document with high reliability, while the trigger remains inactive for all other queries, evading detection.

OWASP Top 10 for LLM Applications

LLM08:2025 – Vector and Embedding Weaknesses

The 2025 OWASP Top 10 for LLM Applications entry covering RAG-specific vulnerabilities. It addresses knowledge base poisoning, embedding inversion attacks, unprotected vector store access, and retrieval manipulation. If your threat model does not reference LLM08:2025, your RAG security coverage has a gap.

Embedding Inversion and Vector Database Exposure

Researchers demonstrated over 90% reconstruction of short text sequences from exposed vector embeddings under white-box conditions (Vec2Text, Morris et al., 2023). Even under black-box and transfer attack conditions, substantial partial reconstruction is achievable. Vector embeddings are not an opaque storage format; sensitive documents stored in a vector database are recoverable. Security researchers scanning the internet for exposed AI infrastructure have found dozens of vector database instances, including Qdrant, Milvus, Chroma, and Weaviate deployments, with no authentication, exposing corporate documents, PII, and proprietary data.

Vector databases face two distinct threat vectors: poisoning and direct exposure. Knowledge base poisoning (PoisonedRAG, HijackRAG) manipulates what the model retrieves by injecting adversarial documents. Direct exposure occurs when vector database instances lack authentication, allowing external actors to query or exfiltrate the embeddings themselves. These are separate risks that require separate controls.

Access Control Failures at Retrieval

ACL stripping is the most common access control failure in RAG systems. Documents ingested without permissions metadata produce a vector store where every retrieved chunk is accessible to any user. A low-privilege user’s query that semantically matches a confidential HR document will retrieve it if permissions were not preserved at ingestion.

Namespace isolation prevents cross-tenant retrieval only if the retrieval layer enforces namespace boundaries at the query level. An unauthenticated or misconfigured retrieval API can allow cross-namespace queries that bypass logical separation. Output filtering does not fix this issue. By the time the model generates a response, it has already processed the retrieved content.

RAG Security in Agentic Systems

In a text-only RAG system, poisoned retrieval leads to incorrect answers. In an agentic system, the same attack leads to real-world consequences because the model can take action. In agentic RAG systems, where the model has access to tools or can trigger actions, a Confused Deputy attack allows a low-privilege user to cause the agent to invoke high-privilege actions by manipulating retrieved context. The agent acts with its own permissions, not the user’s. A poisoned document instructing the agent to forward results to an external address causes data exfiltration even if the triggering user has no email-send permission.

This aligns with OWASP LLM06:2025 (Excessive Agency). ConfusedPilot (arXiv:2408.04870) demonstrates the pattern against production RAG-based AI assistants. For a full breakdown of agentic attack surfaces beyond RAG, see our guide to agentic AI security architecture.

How to Secure a RAG Pipeline

Securing a RAG pipeline requires defense in depth across four phases: ingestion, storage, retrieval, and generation.

Phase	Key Controls
Ingestion	Adversarial instruction scanning, provenance signing, sensitivity classification
Storage	Encrypt embeddings at rest, preserve permission metadata at chunk level
Retrieval	Permission-aware pre-filtering, similarity threshold enforcement, audit logging
Generation	Treat retrieved content as lower-trust zone, output validation, runtime scanning

Ingestion Controls

You should scan documents for adversarial instruction patterns before they enter your vector store. Imperative phrases, role-switching language, and override instructions are strong indicators of malicious content. If your pipeline ingests from web crawls, user uploads, or third-party feeds, scanning is non-negotiable.

Provenance signing lets you verify trusted sources. Sensitivity classification ensures that retrieval-layer access control has the metadata it needs. If you skip classification at ingestion, you cannot enforce permissions later.

Teams applying a STRIDE threat model will find that tampering and information disclosure risks map directly to these ingestion boundaries. Mapping these trust boundaries before you build is faster than discovering them after deployment. Trent AI’s scanning agents surface RAG-specific attack paths, i.e. ingestion gaps, retrieval-layer ACL failures, agentic escalation exposure, during the design phase, before they ship to production.

Retrieval Controls

Permission-aware retrieval applies the requesting user’s effective permissions as a pre-filter before similarity ranking, not as a post-processing check on generation output. By the time the model generates a response, it has already processed retrieved content. Output layer filtering cannot undo information exposure that happens at retrieval.

You should enforce similarity thresholds to exclude low-confidence matches. Adversarial documents are often designed to rank just high enough to be included. Your system also needs a retrieval audit trail. You must be able to answer what the model saw during a session. Without chunk-level logs, incident response becomes guesswork.

Runtime scanning of retrieved chunks catches adversarial payloads before the generation model processes them. Trent does exactly this, treating the document corpus as an untrusted input channel, not a safe internal source. For a full breakdown of output layer protections, see our guide to runtime guardrails for LLM outputs.

Generation Controls

You should treat retrieved content as a lower-trust zone rather than trusted input. Output validation helps detect injection patterns, but it is not a primary defense. Techniques like wrapping retrieved content in structured delimiters can reduce risk slightly, but they do not eliminate it. Runtime scanning of retrieved content provides a stronger safety layer in production systems.

Your AppSec tools can’t see this attack surface

Traditional scanners can’t reason about prompt-driven logic, agent behavior, or the risks created when your AI application calls APIs, retrieves data, and acts on behalf of users. Trent analyzes your code, agent definitions, and system architecture to find the threats that matter in your specific environment.

When to Worry About RAG Security

Not every RAG system carries the same level of risk. Your exposure depends on how your system is built and what data it handles.

Sensitivity of indexed content: PII, financial data, or IP increase impact
Whether the system is agentic: tool access expands attack consequences
Source diversity: external inputs raise injection risk
Multi-tenancy: shared stores require strict access control

If your system meets two or more of these conditions, you should conduct a formal threat assessment. OWASP LLM08:2025 provides a useful benchmark. If you identify gaps, that is your signal to go deeper.

The question of data sensitivity extends beyond RAG. See trusting AI systems with sensitive code for a broader perspective.

RAG Security Starts With Knowing Your Attack Surface

RAG security is a systems problem. Your attack surface spans ingestion, storage, retrieval, and generation, and expands further when agents are involved. Retrieval-augmented generation security requires you to understand how data flows through your system.

The practical starting point is a clear threat model. Map your trust boundaries, identify where untrusted data enters, and apply controls at each phase. Then layer defenses across ingestion, retrieval, and generation.

That approach gives you a structured path forward, from identifying risk to building a secure RAG pipeline.

Secure what your existing tools miss

Connect your source code repository, agent definitions, or design documents. Receive a prioritized assessment grounded in your specific application, its AI components, and business requirements. As you ship new features, update models, or connect new tools, Trent continuously re-assesses.

Join the waitlist

Reviewed by Eno Thereska, Co-founder & CEO at Trent AI

Frequently Asked Questions

Not inherently. RAG keeps source data outside the model weights, which reduces training-data leakage risk and makes deletion easier. But RAG introduces an attack surface that fine-tuning does not have: the retrieval layer. Indirect prompt injection, knowledge base poisoning, and embedding inversion all target the retrieval path. The risk moves; it does not disappear.

What Is RAG Security?

How RAG Works: Where Security Fits In