How to exercise Quality Assurance Engineering principles to Artificial Intelligence systems?

Question

In deterministic (software) systems we have a set of business requirements and ideally, given enough resources, such a system can be fully defined of which are the expected outputs for each inputs or set of actions within a context. The functional QA then is defined to merely assess if the system is following the rules as described. Even usability, endurance, stress and other kind of settings can be fully defined and thus become part of the requirements

However how does one test effectively and detect difference between required and actual behaviors of Artificial Intelligence systems ?

George Pligoropoulos · Accepted Answer · 2022-06-27 18:48:06Z

Without being sure if the approach makes sense but one could take the various steps of the lifecycle of an Artificial Intelligent system and thus attempt to see how as a Quality Assurance Engineer can ensure that the quality is high in each and every step:

Context
- Ensure that there are clear specifications and defined requirements before proceeding with any other testing
Collecting training data
- Ensure data has a variety of sources and necessary variety to avoid biases
- Ensure that after cleaning enough large dataset has remained
- Ensure features are sane and within the expected range after cleaning
- View training data and sample them by eye to see if they make sense
- Write rule based scripts to check if what is generally expected is found within training data
- Ensure that training data represent the targets/outputs in as much the same portion as possible
Testing data
- Ensure that test data are not merely a sample of the training data but at least some of them reflect the business goals (defining expected outcomes as test oracles) and are characteristic examples
- Ensure that testing data are used only once and then are thrown away otherwise they will be used for the next model
- Ensure that testing data, even smaller in size, are still a representative portion of the training data
- Ensure testing data represent the very latest samples that we expect and reflect at least the near future
- Ensure that the system is tested against totally random inputs (noise) and it is returning outputs that are of low certainty
- Ensure that using GAN-based metamorphic approaches ([18] PDF - arxiv.org ) will test the AI system using inputs from the same space as the original data
- Ensure that QAs will have generated by hand a few new test cases and have manually set (using their brain) the expected output
- Ensure that past scenarios executed in production by real users can be replicated fully to be used as test-input
- Robustness: refers to the resilience of an AI component towards perturbations
  - Ensure that small variations, perturbations, in the testing sample will yield similar output to the original and will not yield highly different results (ensure non high variance)
Model wise
- Ensure that a baseline model is always there to compare against
- Ensure that the proposed model performs better than the baseline model
- Ensure that the new proposed model performs better than the latest proposed model
- Ensure that easy to create dummy models using Naive bayes for classification or Linear Regression for regression will not perform better than the proposed model
- Ensure that a low cost to create rule-based, non ai, model will not work better than the proposed model
- Ensure that the model should also provide the probability of the certainty of the model that the output is a good/average/bad prediction
- Ensure that the model is non polarized for a few parameters and therefore non prone to AI-attacks (where some inputs are being changed and change the entire output to our own wish)
- Ensure that an ensemble model, is not overfitting and it works as good or better than any of the individual underlying models
- Ensure that using a Teacher-Student model, that the Teacher is slower yet more accurate model than the Student which is expected to be less accurate but more efficient
- Ensure that self-adaptive and self-learning systems (e.g. Reinforcement Learning) will be able to self-assess themselves to make sure that they are not making
- Interpretability
  - Ensure that using the training data to build an interpretable model that fits the predictions of our large model, then the interpretation of the parameters make sense
  - Ensure that the model is making predictions based on parameters that the current theory supports and does not have any weird pattern which might lead wrong model
Checking output qualitatively
- Ensure that the output of the model for very high probability of certainty are truly delivering a good answer
- Ensure that the bad answers of the model are handled in such a way that the user retains his/her trust to the overall system instead of being misled
- Ensure that the model generates output that is aligned with the business goals and these answers are useful to the user
Performance / Efficiency
- Ensure that the model generates answers fast enough in order for the user experience to not be severely impacted by them
- Ensure that the time to train the new model will not need so large time as to miss the deadlines
- Ensure that minimal resources are provided to AI models which are being under development in comparison to the AI model which is in production and that these are separated without having one (test/staging environment) consuming resources from the other (production)
Production monitoring
- Ensure that a feedback system have been set in place in order for users to be able and report unwanted or misleading output of the AI
- Ensure that the feedback reported by the users is significantly high
- Ensure that the measured error of the system while in production is within the acceptable levels similar to the ones that were measured during the execution of the model to the testing data
- Ensure that the measured error of the system remains steady as new inputs are being received and does not have a declining trend
User output
- Ensure that the output of the model and its certainty probability are reflected correctly in the app
- Ensure the using as input an instance which is very far away from the current distribution of the model will not allow the user to proceed with using the AI system
- Ensure that having as output a prediction that has a low certainty will provide the user manual or rule-based alternatives to accomplish his/her tasks
Data privacy: refers to the ability of an AI component to preserve private data information
- Example: Having a chatbot and having it accumulate knowledge for a certain user, asking this language model information regarding some other user, should not be delivered. Each language model should be agnostic of other language models
Security: measures the resilience against potential harm, danger or loss made via manipulating or illegally accessing AI components
- Ensure that process of AI model is transparent and that there is a history of the changes that have happened to the deployed AI model
Fairness: Avoid problems in human rights, discrimination law and other ethical issues
- Ensure that the model output will comply to some "values" which are coded in rule based scripts
- Example: A Sentiment analysis to never produce that the output of a language model will be very negative

I suggest one additional point but it depends on the field: Profitability. It is linked to performance and the other points, but I think it is crucial, as it defines the whole viability of a project. Many businesses applying AIAAS must have profitable AIs. — Nicolas Martin
– Nicolas Martin, Commented Jul 2, 2022 at 16:30
Thanks @NicolasMartin. I guess profitability is just one specific metric or business goal. Alternatively another company could focus a lot on user engagement and think of profitability as a side-effect of the user engagement with the digital product. Isn't thus always depended on the outlined business goals ? — George Pligoropoulos
– George Pligoropoulos, Commented Jul 4, 2022 at 22:56
I don't know. I've proposed this topic because it is connected to value, and high-valued products have high quality. It's like continuous improvement: a high-quality product shall ease the modification by clear comments and an agile structure because a model often evolves in long-term projects. The quest for improvement and profit is highly connected to quality, but I'm not sure if they are quality criteria. — Nicolas Martin
– Nicolas Martin, Commented Jul 5, 2022 at 7:07

Stack Exchange Network

How to exercise Quality Assurance Engineering principles to Artificial Intelligence systems?

1 Answer 1

Hot Network Questions

How to exercise Quality Assurance Engineering principles to Artificial Intelligence systems?

1 Answer 1

Related

Hot Network Questions