Test Automation Forum

Welcome to TAF - Your favourite Knowledge Base for the latest Quality Engineering updates.

Celebrating our 2nd Anniversary!


(Focused on Functional, Performance, Security and AI/ML Testing)

Brought to you by MOHS10 Technologies


Focused on Functional, Performance, Security and AI/ML Testing

Why current testing processes in AI/ML are not enough?

The current notions of quality assurance and testing in AI/ML pipelines is based on the idea of validation using a random set-aside set of data on which the model is tested and metrics computed thereof. Metrics like accuracy on the random set-aside data set termed ambiguously as test data, is the usual rubric for evaluation of the effectiveness of the ML models. But this only gives a partial picture of the quality of the model, which is not sufficient to guarantee good performance on deployment.

Probably it is because of the terminology of “test data” used in the process that the big picture of testing is missed out in the ML life cycle. There are however some additional validation mechanisms also suggested to further boost the evaluation process like

  1. K Fold cross validation
  2. Bootstrap
  3. Leave One Out Cross Validation

However all the above validation approaches including randomized train test split mechanisms are based on the notion that testing the model on randomized unseen data is a good enough validation of the corresponding model. We feel that is an incomplete picture which is not complete to guarantee overall performance of the model in the field.

Here are a set of qualified reasons as to why we need to think beyond current model evaluation and validation approaches to guarantee AI / ML model quality.

  1. The random selection of test set including cross validation based approaches do not guarantee a comprehensive coverage of the input scenarios, especially corner cases which are rare in nature.
  2. Even though cross validation approaches try to cover the overall spectrum via k-fold approach, a systematic approach to understand and debug as to the performance of the model for different variable scenarios of inputs is not possible. Hence detecting what types of input variations are not being sufficiently represented in the model, is impossible in current approaches.
  3. Testing for security, an important non functional IT requirement, is totally absent in current model evaluation approaches. Not to think of application security, now AI models themselves need to be audited for AI specific attacks, hence there is a need for comprehensive security testing of AI/ML models.
  4. In terms of compliance oriented sectors there is an increased push for generation of explanations or rationale for the AI/ML model decisions. So testing for explainability is must for today s AI ML models.
  5. Performance of AI/ML models is to be tested independent of the system in which the AI/ML models are deployed. Because there are specific deployment formats like tinyML which need a comprehensive validation of performance at model level.
  6. Privacy as well as GDPR imposed constraints on data and derived AI models are a huge set of desiderata for AI ML applications. So testing AI ML models for privacy breaches or attacks and leaks forms an important component of the overall requirements to certify and audit AI models.
  7. Testing and assurance of fairness and bias in AI models is an important requirement of AI models to ensure that they do not get recalled or rescinded.
  8. Finally testing of data quality at input level before being fed to the ML process is vital, as a lot of quality issues at model arise due to which we need to ensure testing of quality at input data level before being fed to the AI / ML model.
  9. In several scenarios there is not sufficient data to test the AI models. In those scenarios the data adequacy of the models need to be tested and if need be mechanisms to augment test data, be made available

Overall these desiderata really point us to the requirement of standalone frameworks and processes and products for AI testing which can handle all the abovementioned tests for ML models of all types. To ensure a trustworthy and responsible AI a comprehensive set of tests of all the points above is a mandatory requirement.

— Dr. Srinivas Padmanabhuni
Note: The article has been republished here with prior approval from the author.

About the Author

Dr. Srinivas Padmanabhuni works for TestAIng as their CTO. He is a well known personality in the field of Artificial Intelligence (AI) and is recognised for his significant contributions in AI. Dr. Srinivas Padmanabhuni is a Ph.D. in Artificial Intelligence. He speaks in several premeire institutes, forums and authorded several technical articles/books in AI/Data Science.

About TestAIng (testAIng.com)

testAIng.com (pronounced as tAI) is a leader in testing AI Systems using their state-of-the-art techniques, tools and technologies.They have combined their deep experience in testing along with AI to create a unique and one-of-its-kind proposition for testers who want to either use AI in their testing process or get their AI systems tested.

Total Page Visits: 1143

1 thought on “Why current testing processes in AI/ML are not enough?”

Comments are closed.

Submit your article summary today!

[wpforms id="2606"]

Thank you for your interest in authoring an article for this forum. We are very excited about it!

Please provide a high level summary of your topic as in the form below. We will review and reach out to you shortly to take it from here. Once your article is accepted for the forum, we will be glad to offer you some amazing Amazon gift coupons.

You can also reach out to us at info@testautomationforum.com