Currently, I'm using SQL Server on an Azure VM (DB B) to read from Power BI. DB B updates via log shipping from a primary SQL Server (DB A) and it locks me out from reading DB B two times an hour. I've considered caching strategies with Power BI to help, but I'm not certain that will solve the problem in the longterm, and the transition from DirectQuery<->Import mode can be a pain.
Requirements/Notes for Suggested Solution(s):
- Not an Enterprise user.
- Changing the log shipping method to another method for updates from DB A is not an option.
- Migrating DB B to Azure SQL Database/managed DB and eliminating SQL Server on Azure VM is not an option.
- The log shipping updates happen at the same times each hour.
- Near(ish) real-time replication would be ideal.
- Transactional replication may not be a solution because each table does not have a primary key.
- Minimizing cost would be ideal.
- Reading from DB B should always be available regardless of data consistency.
- Standing up a third DB, DB C, is an option.
- I'm hesitant about enabling CDC on DB B to use as a basis for an ETL solution because of memory on the DB B VM and as not to cause an issue with the log-shipping processes.
- 5 people may send requests to DB B from time-to-time but never all at once.
I'm thinking about just standing up DuckDB on a VM loaded with Linux and write some Python scripts to update data the few times an hour when DB B is updated via log-shipping.
What are some of your recommended solutions?
DB AtoDB B, or fromDB AtoDB Cor fromDB BtoDB C? It's unclear which you're looking to use for your source and destination. Transactional Replication is possible to use even without primary keys, if the data is unique or if there's a way to uniquify the data, via an indexed view. But that requires access and changes to the database. Are you able to add objects toDB A? How big are the tables that don't have primary keys? How many tables are you trying to replicate?