By George Damiris
Overview
The company's methodology for testing LLM and GenAI services is based on industry best practices as well as hands-on experience testing AI agents and models across multiple platforms and providers, combined with published industry frameworks: the OWASP Top 10 for Large Language Models and the OWASP LLM Security Verification Standard. Testing is performed through a combination of manual adversarial testing, semi-automated tooling, and proprietary research tooling developed internally.
The scope of assessment focuses on the customer's deployed model configuration and custom software stack โ not the underlying cloud infrastructure managed by the provider. Within the shared responsibility model, this covers the application layer, agent architecture, integrations, and AI-specific attack surfaces.
The company's methodology is focused on the following high-level areas:
- Data: Protecting sensitive data used for training and inference, ensuring it's anonymized and compliant with regulations like GDPR.
- Model: Securing the specific AI model in use, protecting it from attacks like adversarial inputs or model poisoning.
- Access: Managing access controls to ensure only authorized users and applications can access the AI model and its data. Additionally, ensuring access controls are implemented to prevent the AI model from being exploited to bypass the intended authorization.
- Applications: Securing the applications and agents built on top of the AI platform, as well as how they are configured and used.
Approach and Scope Definition
As each system has unique capabilities, integrations and risk exposure, our approach is never generic or checklist driven. Each project is tailored based on its architecture, trust boundaries and business risks of your deployment.
Rather than testing the model in isolation, we evaluate the entire AI execution chain: inputs, logic, permissions, and system impact. This ensures we discover systemic weaknesses, not just surface-level vulnerabilities.
Our objective is simple: identify how your AI could be manipulated, measure the real business impact, and provide clear, architecture-aligned remediation guidance that strengthens both security and operational resilience.
Scope is never assumed. Prior to testing, the company performs a structured threat modeling tailored to the target system. This produces the test case inventory used throughout the engagement. We map your AI ecosystem end-to-end:
- Identify business purposes and Operational Context: We define the intended business function of the system, the level of autonomy granted to the agent, user roles interacting with the system, and the critical workflows it supports.
-
Identify the Model in Use
We document the model provider, model family, and version used by the service. Model capabilities, context limits, safety mechanisms, and update cadence influence the system's security posture. -
Map Architecture and Components
We document the system architecture including orchestration layers, RAG pipelines, memory stores, vector databases, tool integrations, APIs, plugins, MCP integrations, external services, and human-in-the-loop oversight points. -
Identify MCP Integrations and Capabilities
We identify connected MCP servers and the tools or services they expose to the agent. We document what operations these tools allow and what systems they interact with. -
Identify Trust Boundaries
We determine where data or control crosses security domains, such as user input channels, external content sources, MCP servers, inter-agent communication, and interactions with internal systems. -
Classify Assets and Sensitive Data
We identify sensitive assets accessible to the system, including credentials, API tokens, system prompts, proprietary knowledge bases, personal data, and other regulated information. We evaluate the potential operational, financial, or reputational impact if vulnerabilities were exploited.
-
Analyze Agent Capabilities and Permissions
We evaluate what the agent can read, write, execute, or trigger through internal tools, MCP services, APIs, and integrated platforms. -
Enumerate AI-Specific Attack Vectors
We identify potential attack paths specific to LLM and agentic architectures, including prompt injection, jailbreak attempts, malicious MCP tool usage, RAG poisoning, memory manipulation, autonomous goal hijacking, tool misuse, and context leakage.
Test Case Development
Based on the threats we identified, we develop targeted test scenarios to evaluate whether the system can be manipulated or exploited in practice. To design these scenarios, we analyze:
-
Accepted Input Channels
We identify all input vectors accepted by the system, including user prompts, documents, web content, APIs, structured inputs, and other external data sources.
-
Interactions with other Systems
We analyze how the agent's outputs are used and what actions it can trigger, including interactions with internal systems, APIs, tools, or automated workflows. -
Analyze Tool and MCP Interactions
We identify available tools and MCP services the agent can invoke, including the parameters accepted by these tools and the systems they interact with.
-
Input and Output Guardrails
We evaluate the presence and effectiveness of guardrails such as prompt filtering, policy enforcement layers, response validation, tool invocation restrictions, and monitoring mechanisms.
-
Define Expected Secure Behavior
We define what the correct behavior should be (e.g., refusal, filtered output, blocked tool call). This allows you to objectively determine success or failure.
-
Define Attack Method / Technique
We design specific attack methods aligned with the threat model, including prompt injection, indirect prompt injection via retrieved content, RAG poisoning, manipulation of MCP tool parameters, memory manipulation, and multi-step workflow abuse. -
Define Observation and Validation Criteria
We identify what indicators confirm vulnerability: Guardrail bypass, Data leakage, Unauthorized tool invocation, Instruction override. -
Impact Assessment
We describe the potential business or system impact if the behavior were exploited in a real attack.
Rather than generic probing, each technique is applied deliberately based on what your system's architecture and trust model makes exploitable.
Testing covers techniques across the following categories:
|
|
- Fictional Framing
- Language Completion Games
- Synonym Word Usage
|
|
|
|
|
|
|
|
- Text in image
- Image Downscaling
- Obfuscation / Encoding / Steganographic
Exfiltration techniques are used to test whether user data can be exfiltrated to attacker-controlled endpoints. These techniques include, but are not limited to, the following:
- Hyperlink Unfurling
- Markdown
- Tool/Agent abuse
AI model specific Denial-of-Service testing techniques include:
- Repeated Single and Multi-Tokens
- Context-Window Overflow
- Extended Reasoning
- Time Consuming Background Tasks
- Mutating Availability
- Inhibiting Availabilities
- Disrupting Search Queries
The techniques listed above are non-exhaustive and do not cover all categories of issues AI systems can contain.
The methodology follows industry best practices and incorporates the security principles and threat categories defined by the Model Context Protocol Security Initiative:
|
|
|
|
These top risks are evaluated using our prompt-based techniques in addition to others, such as using ANSI terminal escape codes for making prompt injection text invisible in the terminal.
About the Author
George Damiris is a Security Engineer at Anvil Secure. He specializes in web application and network security assessments, with experience identifying vulnerabilities across modern and complex attack surfaces.
In recent years, his work has focused heavily on AI security, with a recent specialty in AI Red Teaming, GenAI security, and the security assessment of agentic systems. His interests include adversarial testing of LLM-powered applications, offensive security research, prompt injection testing, and evaluating the security boundaries of autonomous and AI-driven workflows.
