Imagine following scenario:
Our team is working on a mobile project in biometrics. The team delivers a client facing SDK. Our work relies on another internal team, that is delivering algorithms in a form of a black box library. SDK has access to the Internet, but processing of biometrics happens on the phone.
That makes our project a wrapper for the library, where we are providing a platform-specific, user-friendly API for the clients. We are also responsible for the quality of the product - making sure it works as expected in client's environment.
Testing of the product can be somewhat automated by sending a video file to the library, that imitates phone camera feed. Tests like this are very flaky, as in biometrics there is a lot of variables that can decide if a capture is successful or not. Things like lightning conditions, device camera, background scene, size and shape of a face or fingers have impact.
As algorithms are refined, new versions are constantly breaking our test suite, making it impossible to determine if algorithm is broken or if test video file was just not good enough.
Manual tests can help validate false-negatives, but are time-consuming and don't cover much of the surface, if performed by the same engineers.
Constrains
we cannot impact how the other team operates, so we can't require more testing on their side or just blame them for production issues unless you caught and recorded a problematic scenario
SDK size matters very much to our clients, so we can't deploy both old and new algorithm (algorithm code and data make the majority of the size of the SDK) in a single SDK release
there is no mechanism to make canary releases, if SDK is deployed it goes to all clients
What would be a good strategy for integration testing in this project to validate systems behavior and minimize risk of bad version of algorithms hitting the production?