How QA Leaders Can Effectively Use AI in Software Testing
Explore how ML, generative AI, and agentic AI actually show up in testing—from smarter test selection and flaky-test triage to auto-generated scripts and autonomous flows. Get clear on the differences, strengths, blind spots, and risks around data, bias, and control.
Key takeaways:
- The latest advances in artificial intelligence (AI), like agentic AI, are capable of end-to-end AI-powered testing and near-100% test automation.
- Older AI approaches, like generative and predictive AI, also have many uses in AI-driven testing.
- Understanding the data security, compliance, and governance implications of various AI-based approaches is essential, especially in highly regulated industries.
Software testing is essential but has always been laborious. Quality assurance (QA) teams have constantly tried to adapt new technologies to automate and streamline their testing efforts.
However, recent advances in AI promise to fully automate software testing, including autonomous test plan generation and test execution. As a result, search queries for AI in software testing have tripled over the last 24 months.
The new AI stack for software QA leverages agentic AI, generative AI, and traditional machine learning. What exactly are they, and how can you use such AI in software testing? Find out in this blog post.
What are machine learning, generative AI, and agentic AI?
Figure 1. The AI stack
The above illustration depicts the level of abstraction and intelligence at which these AI approaches operate. Let’s look at each approach in more detail.
What is machine learning?
Machine learning (ML) creates an empirical model to describe a phenomenon. ML effectively "learns" the patterns and structures in the variables influencing a phenomenon from their values or derived features.
The goal of the machine learning approach is to predict future outcomes, generalize to new unseen data, or understand data better through classification, clustering, dimensionality reduction, or anomaly detection.
At first, this definition may seem like ML is more suited to structured tabular data. However, ML algorithms range from simple linear regression to complicated nonlinear neural networks for complex unstructured data (like images).
For example, deep neural networks (DNNs) have dozens of layers and millions of parameters. They are capable of:
- text tasks like understanding a language
- computer vision tasks like locating and identifying all the objects in photos
- speech tasks like speech recognition and speaker identification
In addition to such predictive AI, ML models, particularly DNNs, are capable of generative AI.
What is generative AI?
Figure 2. Generative AI
Generative AI (GenAI) uses advanced ML algorithms to create realistic unstructured data like natural language texts, images, videos, or speeches. GenAI is still ML for predictive AI, but its predictions are complex unstructured data — like illustrations in a certain style, photos with a certain look, prose or poetry written in the style of a famous author, or speech that sounds like a specific person.
Some of the advanced ML algorithms used for each type of data (termed as modality) are listed below:
- Natural language text tasks: GenAI can predict natural language in text form (in contrast to speech or animal vocalization forms). Typical tasks include creating prose or poetry generating analysis reports, translating text from one language to another, generating code, or answering questions. These models typically use transformer networks. When the neural network is especially deep with billions of learnable parameters, it's called a large language model (LLM).
- Image and video tasks: ML algorithms like stable diffusion and generative adversarial networks are used for image style transfer, image-to-image transformations, or video-to-video transformations.
- Multimodal tasks: Multimodal tasks are some of the most complex, approaching human-level creativity. Results in one modality are produced from an entirely different modality. For example, realistic photos can be created just by describing them using text. Entire videos can be generated from text scripts.
What is agentic AI?
Figure 3. The path to agentic AI
Agentic AI takes us one step closer to human-level artificial general intelligence by integrating reasoning and autonomy with generative and predictive AI. Agentic AI can translate high-level text instructions into a comprehensive action plan and execute it autonomously through:
- step-by-step reasoning
- reviewing its choices at each step
- reading relevant data from internal or external databases
- invoking external services or application programming interfaces (APIs) whenever necessary
- backtracking its steps if it runs into dead ends and exploring alternate paths
How does agentic AI work?
Under the hood, agentic AI uses a capable LLM or LLM-based multimodal model that's been upgraded to a large reasoning model or large action model through additional fine-tuning on instruction following, reasoning, action invocation, and multi-agent collaboration.
The steering of LLM responses toward human-like thinking and actions happens in three stages.
First, the pretrained LLM's parameters are fine-tuned for general reasoning or action invocation using techniques like reinforcement learning or direct preference optimization.
Second, this fine-tuned model's input context includes a system prompt with instructions on how to act like an autonomous agent for a specific task or domain. These prompts use special techniques to elicit thinking and reasoning, like chain-of-thought prompting, reasoning and acting (ReAct), graph of thoughts, and newer prompt engineering approaches.
The input context may also contain example thought dumps or action plans. Such in-context learning dynamically teaches the model to reason or act according to the given instructions and examples.
Finally, any additional fine-grained instructions or details required for each step can be pulled on demand from external data or services through retrieval-augmented generation (RAG).
What are some uses of machine learning AI in software testing?
Some possible uses of ML in software testing are listed below:
- Test case prioritization: By applying classification to historical data, detected bugs, system logs, and code change metrics, ML can pinpoint high-risk components to focus your testing on.
- Anomaly detection: During automated testing, anomaly detection techniques can process large volumes of data, like logs and performance testing results, to flag unexpected deviations and outliers.
- Object detection: Object detection techniques can identify layout, alignment, and accessibility problems in user interface (UI) layouts and screenshots.
What are the uses of generative AI in software testing?
Generative AI's ability to create realistic text, images, videos, and other complex unstructured data makes it valuable for software testing uses like these:
- Create test cases from requirements: LLMs are capable of test creation based on requirements and vision documents, which boosts development productivity.
- Create test assets: GenAI is good at generating or processing other assets, like test scripts, test environments, test data, harnesses, stubs, reports, and documentation.
- Generate synthetic test data: GenAI excels at generating realistic synthetic data (both clean and invalid) for testing various scenarios. GenAI can even generate complex test data like realistic photos, videos, speech, and audio.
However, remember that public, cloud-based GenAI tools carry enormous risks to your data confidentiality, security, and privacy. Cyberattacks or misconfigurations can result in leakage of confidential data and intellectual property. Use AI tools that you can deploy on-prem behind your organization's firewall to avoid these risks.
AdditionallyHowever, since generative AI models lack reasoning and backtracking capabilities, there are a couple of other risks to keep in mind:
- Be cautious about hallucinations and inaccuracy. Their severity depends on the quality of fine-tuning and guardrails. For critical test paths, include human testers in the loop.
- Shadow prompting is another problem to be aware of, especially when using cloud-based LLM or image generation. Such services sometimes silently alter your prompts to be more compatible with their services. However, this can affect test repeatability.
- For better test repeatability, make all your test teams explicitly set the same system prompts, random seeds, and model temperatures.
What are some industry uses of generative AI in software testing?
Figure 4. Ground scene generated by AI. Shows targets as they appear in a thermal camera, for drone software testing.
The ideas below give you an idea of how to use generative AI in software testing:
- Aerospace and defense: Use image-to-text multimodal models to describe objects and their locations in each frame from a regular or thermal camera under test. Such text descriptions are useful to identify anomalous detections and misidentifications.
- Healthcare: Image generation models can create realistic medical imagery to test automated diagnostic software and electronic medical record software. Similarly, data and audio generation models can create synthetic sensor data and sounds for testing relevant medical equipment.
- Semiconductors: Specially trained models can create realistic images of faulty circuits, wafers, and transistors to test the electronic design automation and diagnostic machines used in semiconductor design and manufacturing.
What are the uses of agentic AI in software testing?
Agentic AI autonomous reasoning and acting capabilities make it an extremely effective and diligent automated software tester, as outlined below.
Agentic AI for functional testing
For example, consider the high-level functional behaviors that clients expect and often describe using subjective wording. A client may want their software to allow users to continue from where they left off, perhaps driven by convenience, productivity, or safety concerns. Something subjective like that is critical for a good user experience. But there's a risk of inconsistent implementation, especially if the features are many, the development team is large, or many contractors with different skillsets are involved.
Luckily, agentic AI can directly take in such early-phase subjective requirements as inputs. Then, using techniques like chain-of-thought or ReAct, it can create, run, and optimize tests on demand to verify even such subjective requirements for every possible user journey.
While automatically navigating a user journey, it can also pull in associated information like user documentation or screenshots to test for consistency and accuracy.
Further, agentic AI can achieve continuous testing throughout your software development lifecycle. You can invoke the testing AI agents from your DevOps, continuous integration, and continuous deployment (CI/CD) workflows.
The AI agents are capable of self-healing by automatically adapting test plans to remain in sync with changes in user interfaces and functionality.
Agentic AI for regression testing
For more effective regression testing, agentic AI can integrate with CI/CD pipelines and:
- automatically monitor code changes in the version control system
- analyze their potential impacts
- intelligently execute existing regression tests or generate new ones to ensure that the changes did not break any existing functionality
Agentic AI vs. traditional automated testing
Agentic AI directly addresses many challenges of traditional automated testing. It:
- reduces the test maintenance effort imposed by frequently changing user interfaces (UIs) and document object models of web applications
- integrates the often fragmented tools for UI testing, API testing, and database testing into cohesive end-to-end testing workflows
- enables comprehensive test coverage within the short durations of agile cycles
What are some industry applications of agentic AI in software testing?
Some potential uses of agentic AI in software testing for various critical industries are outlined below.
Agentic AI in software testing for the aerospace sector
For spacecraft cockpit displays, agentic AI can comprehensively test the eProc system's ability to correctly link fault messages to appropriate displays and procedures, and the crew interface's response to these events.
Manually testing this system is very time-consuming and prone to human error due to thousands of combinations of faults and eProc responses.
However, agentic AI can look up the faults and standard operating procedures from a knowledge base or checklist, generate the faults, initialize the simulator to suitable states, inject the faults, observe the cockpit displays, and verify the UIs using computer vision.
Agentic AI in software testing for the defense sector
Agentic AI is actively being considered for air combat operations and other battlefield domains as well as their testing.
For example, manual testing of command and control software for commander decision-making is complicated by multi-system visualization, diverse interfaces, vast scenario combinations, and battlefield urgency.
Agentic AI alleviates this by building a system model, autonomously generating test scenarios, simulating user interactions and data feeds using computer vision, verifying system responses, and detecting and reporting anomalies.
Agentic AI in software testing for the automotive sector
For advanced driver assistance systems, sensor fusion robustness under dynamic challenging conditions is essential.
Manual testing is complicated by the vast number of combinations of environmental factors and road conditions.
In contrast, agentic AI can easily orchestrate all scenarios in a radar scene emulator and use image understanding, reasoning, and action planning to test even edge cases.
What are some essential considerations for using AI in software testing?
Keep the following aspects in mind when you're planning on using AI in software testing:
- Data security: For critical sectors like aerospace and defense, using cloud-based third-party AI services is not an option. Storing sensitive data or intellectual property outside your secure firewall can lead to legal consequences and hefty fines. Use AI tools that you can deploy on-prem behind your organization's firewall to avoid data confidentiality, security, and privacy issues.
- Privacy: Privacy is a prime concern in the healthcare sector. You need to consider on-prem solutions or private cloud-based solutions that comply with regulations.
- Regulatory compliance: Regulatory frameworks like the European Union's AI Act and the Department of Defense's responsible AI strategy impose various data and usage restrictions on AI.
- Guardrails: Agentic AI must be operated with sufficient guardrails to prevent it from doing any adverse actions. Policies, periodic reviews, and humans-in-the-loop are essential.
- Own or third-party AI: The above considerations discourage cloud-based third-party AI and encourage training your own LLMs and agentic models. However, that isn't easy because you need lots of good training data, fine-tuning experience, and time. A good trade-off is to license capable third-party AI that you can deploy and fine-tune on premises.
How Keysight strengthens your use of AI in software testing
Keysight offers several products that bolster your use of AI in software testing, particularly in critical industries.
Eggplant Test
Eggplant Test integrates predictive, generative, and agentic AI-like capabilities into a single platform for software testing. It enables domain experts to conduct comprehensive UI-level visual testing without knowing various programming languages or scripting.
Eggplant Test automatically maps out all available user interfaces and user journeys on every target platform — web applications, desktop applications, mobile apps, or measurement instruments.
The use of fuzzy logic and intelligent computer vision for matching UI elements makes the tests future-proof and easy to maintain.
Moreover, Eggplant Test is a 100% on-prem system. You achieve security, transparency, and governance out of the box.
Eggplant Digital Automation Intelligence (DAI)
Eggplant DAI is a broader platform that uses model-based testing, AI, and analytics to manage and optimize the entire testing process, including test generation and results analysis. Eggplant Test is integrated with DAI to execute the test snippets defined in the DAI models.
Keysight Generator
Figure 5. Keysight Generator architecture
Keysight Generator uses secure, on-prem generative AI to turn your real-world requirements into accurate, context-aware, and domain-specific test assets. You can download these assets as Gherkin scenarios or actionable test cases.
Streamline your use of AI in software testing
AI in testing isn’t about the latest buzzwords. It’s about making smart, future-proof choices that balance innovation with productivity, security, scalability, and compliance.
Contact us for expert insights on using AI for your software testing!