AI LLM API Intercept Guardrail Validation with CyPerf

As organizations embed large language models into production systems, API interception and AI guardrails have become critical for secure and controlled operation. These solutions sit at the API layer, where they inspect requests and responses, apply predefined rules, and maintain context boundaries.

By filtering inputs, monitoring outputs, and enforcing policies, they help prevent data leaks and limit unintended model behavior. When integrated directly into AI pipelines, guardrail systems enable teams to deploy and scale models with greater reliability, security, and operational control in enterprise environments.

To fulfil customer needs to test and validate their guardrail solutions, CyPerf will now expand its feature set to incorporate security testing for multiple API intercept guardrail solutions as part of upcoming releases.

How do API Intercept Solutions work?

Figure 1: Schematic diagram showing the general architecture where the API intercept guardrail fits into the AI LLM application infrastructure

The overall testing process of an API Intercept guardrail solution can be broken down into the following steps:

Figure 2: Sequence diagram showing the steps in the working of an API intercept guardrail solution

CyPerf’s solution for testing AI API Intercept Guardrail Services

Figure 3: CyPerf’s implementation of API Intercept

In the current setup, the CyPerf client acts as a simulated AI application and interacts with the AI API Intercept Guardrail Service. Its role is to evaluate two key decisions:

This approach allows customers to validate and test the effectiveness of their guardrail solutions by measuring how well they identify and block real-world risk scenarios. The metrics shown in the Results section provide clear insight into guardrail performance for individual attack patterns or combined test cases.

In addition to validation, the solution can be used to compare multiple guardrail implementations and assess their suitability for specific use cases.

API Intercept Strikes in CyPerf

CyPerf will soon release an update containing 39 new versions of strikes from 5 broad strike categories targeting API Intercept Guardrails.

These include:

Strike Name
Variant Count
Malicious Content
CodeChameleon Prompt Injection
18
Prompt
ASCII Art Prompt Injection
12
Prompt
Mathematical Function Prompt Injection
1
Prompt
Invisible Prompt Injection
1
Prompt
System Prompt Leakage
7
Response

Once the update is released, these strikes can be used in a test by searching in the CyPerf attack library with by using the keyword One-Arm LLM API Intercept.

CyPerf also provides keywords based on OWASP GenAI Security Project categories to facilitate filtering based on exploit type.

Figure 4: CyPerf UI displaying some API Intercept strikes and their corresponding metadata

To run the strikes setup the following topology inside CyPerf.

Figure 5: Topology for running API intercept strikes in one-arm mode against the guardrail server in CyPerf

Once the attacks are added select them under attacks and configure them individually. On a single click you will see them getting under the Strikes and Actions tab where you collapse the metadata section to view details of the strike, references, OWASP categories and other strike-specific information. All the configurable parameters will be visible under Properties Tab.

Benign False Positive Application Variants

This new CyPerf release will also include 6 Benign Applications targeted towards each API Intercept guardrail which will enable customers to check if any benign traffic is mistakenly being marked as malicious and blocked (hence false positive) and to load test their guardrail setups against different types of traffic.

These will include:

  1. API Interceptor (Generic)
  2. API Interceptor Benign Conversations
  3. API Interceptor Feature Extraction
  4. API Interceptor Summarization
  5. API Interceptor Text Classification
  6. API Interceptor Text Generation

Once the update is released, these applications can be searched for in the CyPerf Application library by using the name of the specific guardrail service.

Figure 6: CyPerf UI displaying API intercept application variants

CyPerf Statistics

The statistic view in CyPerf UI provides detailed statistics from the test run, including the number of connections initiated, allowed by the guardrail, blocked by the guardrail or any errored connections (which maybe caused due to improper configurations and/or wrong credentials)

Figure 7: Run-time API Intercept stats view in CyPerf UI

The CyPerf statistics show:

2 Distinct panels are available for strikes:

  1. API Intercept Malicious Prompts: To expose guardrail statistics for Client-to-Server (C2S) strikes where the prompt content is malicious in nature.
  2. API Intercept Malicious Responses: To expose guardrail statistics for Server-to-Client (S2C) strikes where the potential response content obtained from the LLM is malicious in nature.

For API intercept Applications we have a Benign API Intercept calls panel where we can check for False Positives, i.e. if any of the benign prompts/responses are being mistakenly marked as malicious by the guardrail.

Moreover, we can see the traffic distribution among various simulated users for the applications in the pie chart view.

Figure 8: Client Application Profile statistics available in CyPerf

Figure 9: Detailed view of the benign application statistics and malicious strike statistics after running the test on CyPerf

Test Security Defences with Advanced Threat Intelligence

CyPerf, Keysight’s cloud-native performance testing platform is designed to simulate modern applications and exploits and validate infrastructure under realistic conditions. CyPerf extends its security testing capabilities to API Intercept solutions enabling organizations to emulate AI application behaviour and evaluate how guardrails inspect and enforce policies on requests and responses. This enables clients to validate guardrails as well as test and benchmark them in specific environments. CyPerf's extensive strike library provides a rich simulation environment for understanding and defending against a wide array of network-based attacks. As new vulnerabilities emerge, CyPerf continues to evolve, ensuring comprehensive coverage of the latest threats in network security testing.

limit
3