Test Automation Forum

Welcome to TAF - Your favourite Knowledge Base for the latest Quality Engineering updates.

Celebrating our 2nd Anniversary!


(Focused on Functional, Performance, Security and AI/ML Testing)

Brought to you by MOHS10 Technologies


Focused on Functional, Performance, Security and AI/ML Testing

leveraging openai gpt-3 for next-gen test automation


The start of a new decade has brought with it the wonder and awe of Artificial Intelligence (AI). The biggest breakthrough has been through the efforts of the not-for-profit research company, OpenAI. Originally created in 2015, as an antithesis to Google Deepmind to freely collaborate with the research community and spearhead the ethical development of AI, they have launched several revolutionary products like Dall-E, MuseNet, Whisper, Dactyl, Codex, and the most popular GPT language models.


GPT-3 (Generative Pre-trained Transformer 3) is one of the most advanced natural language processing (NLP) models and has the potential to generate responses to an unlimited range of human language queries with little to no human input. GPT-3 works by looking for patterns in text. The model is trained on a massive text dataset of over 45TB of curated text sourced from across the web with a whooping 175 billion parameters. It can be used for a variety of natural language processing tasks, including question-answering, summarization, conversation modelling and text generation. With advanced capabilities in language understanding, generated text, and conversational AI, OpenAI GPT-3 is regarded as the most powerful language model to date.

How does it work?

The model is based on multi-layer transformer architecture, which is a neural network that learns context and thus meaning by tracking relationships in sequential data like the words in this sentence. Transformer models apply an evolving set of mathematical techniques, called attention or self-attention, to detect subtle ways even distant data elements in a series influence and depend on each other.

  • GPT-3 uses a technique called masked language modelling, which implies that it is trained to predict the next word in a sentence given the previous words. This allows the model to generate text that is more natural and human-like.
  • It utilizes zero-shot learning, wherein a pre-trained deep learning model is made to generalize on a novel category of samples, i.e., the training and testing set classes are disjoint. Zero-shot methods generally work by associating observed and non-observed classes through some form of auxiliary information, which encodes observable distinguishing properties of objects. This allows the model to generate highly accurate results with just a few input words and produce very natural-sounding text.
  • Lastly, it employs transfer learning to exploit the knowledge gained from a previous task to improve generalization about another. This allows the model to quickly adapt to new tasks and generate more accurate results. This is significant in deep learning since most real-world problems typically do not have millions of labelled data points to train such complex models.
  • The model makes use of reinforcement learning(RL) algorithms and re-trains itself from user feedback for contiguous learning so as to improvise itself with time.

Testing with GPT-3

OpenAI GPT3 might be a great choice for automating the software development life cycle (SDLC). With GPT-3, the Test engineers can leverage the following use cases:

  • The most common use case for a text generation model within testing would be definitely Test Code Generation i.e. automatically generating test scripts based on data. Minimal to zero manual interventions and time spent looking up IDs, selectors, or xpaths.
  • With GPT-3, Test Script & Test Case Generation can be made smooth, as GPT-3 uses a Prompt, Example, and Output model. To generate test scripts, the test engineer simply needs to provide a Prompt that includes the context of what they are trying to do. For example, simple text: “Open www.xyz.com, and login”, test cases, or analytics data. Then the test engineer needs to provide an example of what they expect back from GPT-3, which in this case would be an example of the code in the language you wish to convert the data to. Supplying approximately 4-6 examples will yield the best results. Once those two things are supplied to GPT-3, the output will return the code for the prompt given, which can then be saved to a file either permanently or temporarily and then executed.
  • We can also apply similar principles to generating entire test frameworks based on input loaded into GPT-3 and converting it into customized test frameworks for the application under test (Web, Mobile, API). The engineer can simply specify the application under test, what language, and the type of automation framework they would like to begin with, and then the framework can be automatically generated within a very short period.

How can we leverage GPT-3 in test automation development?

OpenAI GPT-3 is powerful and a perfect indication of where AI is headed in terms of integrating AI systems into test automation via its quick setup and easy-to-use integration. There are some learning curve areas that should be taken into account when evaluating if this tool would be right for your team. But in comparison to generating your own custom model, the learning curve is completely manageable.

The test automation development process can be broadly classified into the following three major phases. Let us see how we can use GPT-3 to accelerate each one of these phases and make the entire process faster and more efficient.

1. Identifying the application under test.

The first phase of the test automation development process is identifying applications under test (AUT). This involves identifying the business logic, functional requirements and non-functional requirements that need to be tested. GPT-3 can identify the application under test by using natural language processing (NLP) to analyse the code base, text in the user interface and the associated documentation to determine the application type. For example, if the application contains English words like “Shipping” or “Add to cart,” etc. then GPT-3 can infer that the application is an e-commerce platform. By following this step, we can create reusable test cases/scenarios based on these identified objects/functions/dependencies which can be used later during any testing activities such as manual regression or exploratory testing. AI can also help accelerate your entire process.

2. Creating test cases/scenarios.

Once the data structures have been identified, GPT-3 can then generate test scenarios for each case. Also, you can leverage GPT-3 to generate the required Test Data if required. For example, if testing an e-commerce store, generate scenarios that cover different types of shoppers such as first-time buyers, returning customers and multiple-item shoppers.

After generating scenarios, GPT-3 can be further used to create test scripts for testing the data structures. The scenarios can be then validated against the expected behaviour and existing system requirements and results can be evaluated further. Once these reusable tests are created for each object/function in your system, it becomes easier for you to reuse them across multiple projects without having any conflicts with each other due to their different scopes & contexts (e.g., different QA teams).

3. Designing and developing a framework for automated testing (including code generation)

 The third phase in creating automated tests involves designing framework(s) by which we can express our requirements using high-level languages thus test engineers focus on application functionality, instead of syntaxes, which significantly contributes to overall efficiency.


1. One of the things that testers miss out on or consider as overburdening is adding resiliency to the automation code. It is usually deprioritized as the priority2-priority3 item. However, with GPT-3, we can conveniently create code to add resiliency to the automation script.

2. Adding security-oriented best practices should be prioritized from day 1. However, if you have unsecured legacy code or a shortage of time to find security vulnerabilities in the code, GPT-3 can help out. In addition, most of the code pieces that GPT-3 creates are secure by default. But sometimes, you may have to prompt the platform to create secure code.

3. GPT-3 is an excellent tool for overcoming limitations around knowledge of a particular technology. For example, if you are a Selenium expert but are not well-versed in GitHub pipelines, you can use GPT-3 to at least get you started and provide starter codes to help you create GitHub workflows. However, a word of caution is that GPT-3 is not perfect or foolproof.

4. When it comes to debugging, GPT-3 is a useful addition to any software developer’s toolkit. There are examples you can find on the Internet where people have copy-pasted their code and got back the exact reason for failure as an output response. Again this is not 100% foolproof, and GPT-3 may miss out on obvious issues, but still, it can help you get started or give you a new perspective while debugging the code.


1. GPT-3 is based on statistical patterns and does not understand the underlying meaning of words. The model predicts the next words based on the words that have been used prior/before. However, it does not have an underlying understanding of the meaning of those words. This means that it cannot be used as effectively in situations where the user’s questions or statements require an understanding of a context that has not been explained before.

While these may seem minor limitations, it’s a huge thing if you depend on GPT-3 for testing. For example, the accuracy of GPT-3 will drastically decrease if you have to create test cases that require a prior deep understanding of the System-under-test.

2. The underlying technology in GPT-3, is a deep-learning language model that has been trained on large data sets of human-generated content. Here we are assuming that it has also learned Code as text; therefore, it has been able to create such accurate codes. That means it cannot accurately respond to things it has not learned before or may give wrong information if its learning has not been updated.

For example, if its last learning phase was on a framework that has since deprecated half of its methods, then the code it will create will use those deprecated methods. So the user would have to ensure that the final code they are using is up to date.

3Another challenge of creating code through GPT-3 is that you have to deal with partially written code. So if you are dependent on GPT-3 based code, you would first have to understand the incomplete code, finish it, or modify it to suit your needs. And as you can imagine, this is often a challenging thing to do as there are so many things that could go wrong. Even if you manage to get what you want, the final product will likely not be as good as if you were to write the code from scratch. But on the flip side, sometimes extending the code or debugging the code may be easier than creating repetitive code from scratch.

4. GPT-3 is dependent on assumptions. Software testers are trained to identify these hidden factors that could potentially cause an app to fail, and when they do so, they can build into their test cases ways to check for these issues. But what happens when your testers aren’t given enough time to test all their assumptions? What if the information needed to validate an assumption isn’t available? When other teams are involved in building the product, like QA or Development, this can be difficult to control.

There is a similar problem with GPT-3 as well. The platform starts with many assumptions about the use case you inputted. Most of the time, these assumptions are evident, and it’s easy to work around them, but often these assumptions lead to very inaccurate code.

Will AI replace testing teams?

GPT-3 and similar AI technologies sure have tremendous potential in the field of testing and automation testing. However, saying that they will replace testing teams is still not possible. This is because more often than not all the code generated is not perfect. They are pretty close to runnable code in most cases; however, there are still issues, ranging from syntax errors to missing crucial steps because of context gaps. But experienced developers can provide enough of the boilerplate that it becomes easy to tweak, debug and run independently.

This tool, if used correctly, will enable the teams to get started with testing tasks much earlier and faster. Proper tooling created with this technology in the background will empower testers to not worry much about automation but focus on test cases themselves.


To conclude, leveraging AI in our Test Automation Development Process can bring in a lot of efficiencies. GPT-3 is an exciting and powerful tool for test automation development. With that being said, it is important to highlight that it is still in its early stages, and constant updates are being made to add features or fix bugs. In addition, it employs reinforcement learning algorithms to constantly learn from user feedback. Hence its accuracy will increase with time. This means users must stay on top of these changes to continue using GPT-3 efficiently.

GPT-3 can serve as a great foundation for your next test automation project and can be used to accelerate each phase of the entire process. We can thus minimize the efforts required to perform the same task several times, reducing human errors and making our test cases more robust.

Total Page Visits: 1779

Submit your article summary today!

[wpforms id="2606"]

Thank you for your interest in authoring an article for this forum. We are very excited about it!

Please provide a high level summary of your topic as in the form below. We will review and reach out to you shortly to take it from here. Once your article is accepted for the forum, we will be glad to offer you some amazing Amazon gift coupons.

You can also reach out to us at info@testautomationforum.com