I recommend going with "Set up a two-hop architecture" option you provided. This approach adds an extra layer of security and control, as the costly machine learning API is not directly exposed to the client. You can implement rate-limiting, logging, or any other security measures at the cloud function level.
This architecture allows you to keep your database and machine learning API separate, enabling you to scale each component independently based on its own requirements. It also mitigates the risk of having to manage two separate authentication systems, as you mentioned with the "JWT and Fast API" option.
To address your concerns about pricing and provider lock-in, you can design your architecture in a way that allows you to easily switch providers. Use environment variables for configuration and keep provider-specific code isolated. Most cloud providers offer cost calculators to help you estimate expenses, and you can set up monitoring and alerts to keep track of usage and any unauthorized access attempts.
Your secondI think your approach offers a more secure and scalable solution that aligns well with best practices for API gatekeeping.