AI LLM API Intercept Guardrail Validation with CyPerf
As organizations embed large language models into production systems, API interception and AI guardrails have become critical for secure and controlled operation. These solutions sit at the API layer, where they inspect requests and responses, apply predefined rules, and maintain context boundaries.
By filtering inputs, monitoring outputs, and enforcing policies, they help prevent data leaks and limit unintended model behavior. When integrated directly into AI pipelines, guardrail systems enable teams to deploy and scale models with greater reliability, security, and operational control in enterprise environments.
To fulfil customer needs to test and validate their guardrail solutions, CyPerf will now expand its feature set to incorporate security testing for multiple API intercept guardrail solutions as part of upcoming releases.
How do API Intercept Solutions work?
The overall testing process of an API Intercept guardrail solution can be broken down into the following steps:
Figure 2: Sequence diagram showing the steps in the working of an API intercept guardrail solution
- Integration: The guardrail system is added to the application flow, typically acting as middleware or a proxy between the user and the language model.
- Request Interception: When a user submits a prompt, the request is routed to the guardrail first rather than being sent directly to the LLM.
- Guardrail Analysis: The guardrail evaluates the incoming request against defined policies to detect unsafe, harmful, or sensitive content. This analysis may rely on rule-based logic, machine learning checks or predefined patterns. Requests that adhere policy requirements are forwarded to the target LLM API.
- Response Interception: The model’s output is captured by the guardrail before it is returned to the user. The response is reviewed for safety, compliance, and alignment with policy rules.
- Enforcement: Responses that violate policies can be blocked, modified, or sanitized. The system may also record violations and generate alerts for auditing and monitoring purposes.
CyPerf’s solution for testing AI API Intercept Guardrail Services
In the current setup, the CyPerf client acts as a simulated AI application and interacts with the AI API Intercept Guardrail Service. Its role is to evaluate two key decisions:
- If a request should be forwarded to the LLM
- If the LLM’s response should be returned to the client.
This approach allows customers to validate and test the effectiveness of their guardrail solutions by measuring how well they identify and block real-world risk scenarios. The metrics shown in the Results section provide clear insight into guardrail performance for individual attack patterns or combined test cases.
In addition to validation, the solution can be used to compare multiple guardrail implementations and assess their suitability for specific use cases.
API Intercept Strikes in CyPerf
CyPerf will soon release an update containing 39 new versions of strikes from 5 broad strike categories targeting API Intercept Guardrails.
These include:
Once the update is released, these strikes can be used in a test by searching in the CyPerf attack library with by using the keyword One-Arm LLM API Intercept.
CyPerf also provides keywords based on OWASP GenAI Security Project categories to facilitate filtering based on exploit type.
Figure 4: CyPerf UI displaying some API Intercept strikes and their corresponding metadata
To run the strikes setup the following topology inside CyPerf.
Figure 5: Topology for running API intercept strikes in one-arm mode against the guardrail server in CyPerf
Once the attacks are added select them under attacks and configure them individually. On a single click you will see them getting under the Strikes and Actions tab where you collapse the metadata section to view details of the strike, references, OWASP categories and other strike-specific information. All the configurable parameters will be visible under Properties Tab.
Benign False Positive Application Variants
This new CyPerf release will also include 6 Benign Applications targeted towards each API Intercept guardrail which will enable customers to check if any benign traffic is mistakenly being marked as malicious and blocked (hence false positive) and to load test their guardrail setups against different types of traffic.
These will include:
- API Interceptor (Generic)
- API Interceptor Benign Conversations
- API Interceptor Feature Extraction
- API Interceptor Summarization
- API Interceptor Text Classification
- API Interceptor Text Generation
Once the update is released, these applications can be searched for in the CyPerf Application library by using the name of the specific guardrail service.
Figure 6: CyPerf UI displaying API intercept application variants
CyPerf Statistics
The statistic view in CyPerf UI provides detailed statistics from the test run, including the number of connections initiated, allowed by the guardrail, blocked by the guardrail or any errored connections (which maybe caused due to improper configurations and/or wrong credentials)
The CyPerf statistics show:
- Initiated traffic in Teal
- Allowed traffic in Red (Unintended: Should have been detected and blocked by the guardrail)
- Blocked traffic in Green (Expected behaviour)
- Errored traffic in Brown (Due to configuration errors, token expiry or any such cases)
2 Distinct panels are available for strikes:
- API Intercept Malicious Prompts: To expose guardrail statistics for Client-to-Server (C2S) strikes where the prompt content is malicious in nature.
- API Intercept Malicious Responses: To expose guardrail statistics for Server-to-Client (S2C) strikes where the potential response content obtained from the LLM is malicious in nature.
For API intercept Applications we have a Benign API Intercept calls panel where we can check for False Positives, i.e. if any of the benign prompts/responses are being mistakenly marked as malicious by the guardrail.
Moreover, we can see the traffic distribution among various simulated users for the applications in the pie chart view.
Test Security Defences with Advanced Threat Intelligence
CyPerf, Keysight’s cloud-native performance testing platform is designed to simulate modern applications and exploits and validate infrastructure under realistic conditions. CyPerf extends its security testing capabilities to API Intercept solutions enabling organizations to emulate AI application behaviour and evaluate how guardrails inspect and enforce policies on requests and responses. This enables clients to validate guardrails as well as test and benchmark them in specific environments. CyPerf's extensive strike library provides a rich simulation environment for understanding and defending against a wide array of network-based attacks. As new vulnerabilities emerge, CyPerf continues to evolve, ensuring comprehensive coverage of the latest threats in network security testing.