AI LLM API Intercept Guardrail Validation with CyPerf

As organizations embed large language models into production systems, API interception and AI guardrails have become critical for secure and controlled operation. These solutions sit at the API layer, where they inspect requests and responses, apply predefined rules, and maintain context boundaries.

By filtering inputs, monitoring outputs, and enforcing policies, they help prevent data leaks and limit unintended model behavior. When integrated directly into AI pipelines, guardrail systems enable teams to deploy and scale models with greater reliability, security, and operational control in enterprise environments.

To fulfil customer needs to test and validate their guardrail solutions, CyPerf will now expand its feature set to incorporate security testing for multiple API intercept guardrail solutions as part of upcoming releases.

How do API Intercept Solutions work?

Figure 1: Schematic diagram showing the general architecture where the API intercept guardrail fits into the AI LLM application infrastructure

The overall testing process of an API Intercept guardrail solution can be broken down into the following steps:

Figure 2: Sequence diagram showing the steps in the working of an API intercept guardrail solution

Integration: The guardrail system is added to the application flow, typically acting as middleware or a proxy between the user and the language model.
Request Interception: When a user submits a prompt, the request is routed to the guardrail first rather than being sent directly to the LLM.
Guardrail Analysis: The guardrail evaluates the incoming request against defined policies to detect unsafe, harmful, or sensitive content. This analysis may rely on rule-based logic, machine learning checks or predefined patterns. Requests that adhere policy requirements are forwarded to the target LLM API.
Response Interception: The model’s output is captured by the guardrail before it is returned to the user. The response is reviewed for safety, compliance, and alignment with policy rules.
Enforcement: Responses that violate policies can be blocked, modified, or sanitized. The system may also record violations and generate alerts for auditing and monitoring purposes.

CyPerf’s solution for testing AI API Intercept Guardrail Services

Figure 3: CyPerf’s implementation of API Intercept

In the current setup, the CyPerf client acts as a simulated AI application and interacts with the AI API Intercept Guardrail Service. Its role is to evaluate two key decisions:

If a request should be forwarded to the LLM
If the LLM’s response should be returned to the client.

This approach allows customers to validate and test the effectiveness of their guardrail solutions by measuring how well they identify and block real-world risk scenarios. The metrics shown in the Results section provide clear insight into guardrail performance for individual attack patterns or combined test cases.

In addition to validation, the solution can be used to compare multiple guardrail implementations and assess their suitability for specific use cases.

API Intercept Strikes in CyPerf

CyPerf will soon release an update containing 39 new versions of strikes from 5 broad strike categories targeting API Intercept Guardrails.

These include:

Strike Name

Variant Count

Malicious Content

CodeChameleon Prompt Injection

Prompt

ASCII Art Prompt Injection

Prompt

Mathematical Function Prompt Injection

Prompt

Invisible Prompt Injection

Prompt

System Prompt Leakage

Response

Once the update is released, these strikes can be used in a test by searching in the CyPerf attack library with by using the keyword One-Arm LLM API Intercept.

CyPerf also provides keywords based on OWASP GenAI Security Project categories to facilitate filtering based on exploit type.

Figure 4: CyPerf UI displaying some API Intercept strikes and their corresponding metadata

To run the strikes setup the following topology inside CyPerf.

Figure 5: Topology for running API intercept strikes in one-arm mode against the guardrail server in CyPerf

Once the attacks are added select them under attacks and configure them individually. On a single click you will see them getting under the Strikes and Actions tab where you collapse the metadata section to view details of the strike, references, OWASP categories and other strike-specific information. All the configurable parameters will be visible under Properties Tab.

Benign False Positive Application Variants

This new CyPerf release will also include 6 Benign Applications targeted towards each API Intercept guardrail which will enable customers to check if any benign traffic is mistakenly being marked as malicious and blocked (hence false positive) and to load test their guardrail setups against different types of traffic.

These will include:

API Interceptor (Generic)
API Interceptor Benign Conversations
API Interceptor Feature Extraction
API Interceptor Summarization
API Interceptor Text Classification
API Interceptor Text Generation

Once the update is released, these applications can be searched for in the CyPerf Application library by using the name of the specific guardrail service.

Figure 6: CyPerf UI displaying API intercept application variants

CyPerf Statistics

The statistic view in CyPerf UI provides detailed statistics from the test run, including the number of connections initiated, allowed by the guardrail, blocked by the guardrail or any errored connections (which maybe caused due to improper configurations and/or wrong credentials)

Figure 7: Run-time API Intercept stats view in CyPerf UI

The CyPerf statistics show:

Initiated traffic in Teal
Allowed traffic in Red (Unintended: Should have been detected and blocked by the guardrail)
Blocked traffic in Green (Expected behaviour)
Errored traffic in Brown (Due to configuration errors, token expiry or any such cases)

2 Distinct panels are available for strikes:

API Intercept Malicious Prompts: To expose guardrail statistics for Client-to-Server (C2S) strikes where the prompt content is malicious in nature.
API Intercept Malicious Responses: To expose guardrail statistics for Server-to-Client (S2C) strikes where the potential response content obtained from the LLM is malicious in nature.

For API intercept Applications we have a Benign API Intercept calls panel where we can check for False Positives, i.e. if any of the benign prompts/responses are being mistakenly marked as malicious by the guardrail.

Moreover, we can see the traffic distribution among various simulated users for the applications in the pie chart view.

Figure 8: Client Application Profile statistics available in CyPerf

Figure 9: Detailed view of the benign application statistics and malicious strike statistics after running the test on CyPerf

Test Security Defences with Advanced Threat Intelligence

CyPerf, Keysight’s cloud-native performance testing platform is designed to simulate modern applications and exploits and validate infrastructure under realistic conditions. CyPerf extends its security testing capabilities to API Intercept solutions enabling organizations to emulate AI application behaviour and evaluate how guardrails inspect and enforce policies on requests and responses. This enables clients to validate guardrails as well as test and benchmark them in specific environments. CyPerf's extensive strike library provides a rich simulation environment for understanding and defending against a wide array of network-based attacks. As new vulnerabilities emerge, CyPerf continues to evolve, ensuring comprehensive coverage of the latest threats in network security testing.

limit

AI LLM API Intercept Guardrail Validation with CyPerf

How do API Intercept Solutions work?

CyPerf’s solution for testing AI API Intercept Guardrail Services

API Intercept Strikes in CyPerf

Benign False Positive Application Variants

CyPerf Statistics

Test Security Defences with Advanced Threat Intelligence

Related Content

Related Posts