Test
Automation
Forum

Focused on Functional, Performance, Security and AI/ML Testing

in-depth testing of AI applications that use images

Generally, in the MLOps (methodology to develop ML based applications) we have design, develop and operations phases, wait something important is missing…I hope by now you got it, yes there is no testing phase in MLOps (like security, bias, performance etc.), but here is the question; how does ML applications are tested in order to make them Responsible-AI (RAI)?

Have you ever thought how AI/ML based applications are tested? If you are someone who is curious about how AI/ML applications are tested then this article is for you. In this article, I’m going to discuss how did we test an AI-based Plant diagnostic application in order to make it reliable, robust and accurate. 

Business case

The challenge was to test a plant diagnosis application that supports various crop types. It was developed for farmers and gardeners to diagnose infected crops, offer treatments for diseases and nutrient deficiencies, and enable collaboration with other farmers and so on. The plant disease recognition is done by using AI image recognition technology (artificial intelligence based Neural Networks algorithm).

How AI application testing is different:

Compared to regular software applications, developing AI-based applications is different. With AI-based applications, we work with data and code.AI Application development process goes through steps like data collection, data cleaning, feature engineering, Model selection, Train & test and so on. And this is what AI application development is different from the traditional software development process. With most AI models, the data is split into two sets, one to train the model and the other to test the model. Once certain metrics are used to gauge the model’s performance on the test data, the model is either validated or sent back to the previous stage for revision. Do you think this level of testing is sufficient for an application that will make decisions, solve problems, and become part of people’s daily lives? Probably not! Let’s continue reading.

How to test an AI app to ensure its reliability:

There are several things that we can do to make an AI model more reliable, such as making it more robust. To achieve this, we need to test the AI models in different ways:

Randomized testing- Test the Al system to evaluate how the model performs with unseen data.
Cross-validation techniques- Evaluate the effectiveness of the model by iterating the metrics evaluation across several iterations of splits of the data.Example: K-FoId Cross validation, Bootstrap & LooCv etc.
Test coverage- Pseudo Oracle Based Metamorphic testing, White box coverage-based testing, Layer level coverage, Neuron Coverage based testing.
Test for bias- Test for the fairness of the ML model for any discriminatory behavior based on specific attributes like gender, race etc.
Test for agency- Testing for closeness to human behavior. To compare two different models, to evaluate the Al ML models dimensions of Al quality like natural interaction and personality.
Test for concept drift- Continuously check for data drift and hence the model drift which causes the deployed model to perform badly on newer data.
Test for explainability- To enable testing for the “transparency of choices” element, we need to have a comprehensive approach to test the models for explainability.
Security testing- Security testing for adversarial attacks is a primary component of any AI/ML test. We should test for potential attacks on current training data. Example: White Box and Black Box attacks.
Test for Privacy- Test at model level for privacy attacks which makes it possible to infer data, and then to check if the inferred data has PII embedded inside it.
Test for Performance- Check whether the system is able to handle different patterns of input loads, including spike pattern like e-commerce site during boxing day etc.

How did we test the Plant Diagnosis application at our AI lab:

In our process of testing the plant diagnosis application, we collected the data and model from our client in the required format. By using our strategic partner’s commercial state-of-the-art testing product called AIensured we tested the model. The results of the model having insights from both data and model performance was shared with the application owner. Following are the key benefits we provided to our client:

Generated corner cases (cases where model fails to give actual result) and trained again on corner cases to increase its robustness.
We used 11 attack vectors techniques like DeepFool, Universal Perturbation, Pixel Attack, Spatial Transformation etc., to know how robust it is against security attacks.
The Model Explainability which includes both white box and black box explanations helped them to understand on which portion of the image their model is focusing and this helped them to know what caused the misclassification.
To overcome the oracle problem (not having a defined output) we did metamorphic testing and that included techniques like rotation, shear, brightness etc., which helped them to know how the model is performing.
Model quantization allowed them to reduce their model size without losing its accuracy. This helped them to incorporate their model on low-end electronic devices as well.

List of the tests that were performed on their model are as depicted in the below graphics:

Results:

Bottom line is, after retraining the model with generated corner cases, the performance of the model was found to be increased by around 12%. The report shared by us helped them to make their model explainable and ensured compliance with the required privacy governance and above all, we made their model responsible and robust to security attacks and improved overall performance of the model.

I hope this article was insightful! Please don’t hesitate to contact me in case you have a question or suggestions.

Happy learning!

Total Page Visits: 1851

Welcome to TAF - Your favourite Knowledge Base for the latest Quality Engineering updates.

Celebrating our 2nd Anniversary!

Test
Automation
Forum

(Focused on Functional, Performance, Security and AI/ML Testing)

Brought to you by MOHS10 Technologies

Test
Automation
Forum

Focused on Functional, Performance, Security and AI/ML Testing

( A Tri Rath initiative )

in-depth testing of AI applications that use images

Harikrishna Para

Test Automation Forum

Mission

Vision

Values

Connect with Us

Follow us

Submit your article summary today!

Welcome to TAF - Your favourite Knowledge Base for the latest Quality Engineering updates.

Celebrating our 2nd Anniversary!

TestAutomationForum

(Focused on Functional, Performance, Security and AI/ML Testing)

Brought to you by MOHS10 Technologies

TestAutomationForum

Focused on Functional, Performance, Security and AI/ML Testing

( A Tri Rath initiative )

in-depth testing of AI applications that use images

Harikrishna Para

Test Automation Forum

Mission

Vision

Values

Connect with Us

Follow us

Submit your article summary today!

Test
Automation
Forum

Test
Automation
Forum