I am running a website for a customer who has terrabytes worth of images. Each JPG image is high-res (20MB) and belongs to a hierarchy like this:
Group A SubGroup 1 subsubgroup a Image A1a.1 Image A1a.2 ... Currently the customer can only view images for any given subsubgroup at any one time, meaning only ~30 images get loaded on the page. Not a problem.
The images are stored in a cloud bucket (GCP). Currently, my server just sends the pre-signed URLs and the client loads them.
The customer has request the ability to mass download all images in a Group from their browser.
Some Groups can contain hundreds of SubGroups, and each one of these can contain multiple subsubgroups. As an illustrative example, for a Group of 20 SubGroups, we're talking about 20GB of data.
How can I achieve this reliably?
Possible solutions:
Download all images on the server, create a zip (or number of zips) and upload to bucket. Share link with customer. Would probably need to run as a background job. This feels very flaky and I am doubtful it'll even work given the size of the data.
Create a new page that renders all images. My server only sends the pre-signed URLs from GCP bucket. Add some basic JS functionality to download all images on the page.
Some sort of SFTP server?
My current stack is relatively minimal:
- 1 cloud VM
- Flask app, uwsgi server, behind an NGINX reverse proxy
- Cloud storage (GCP) for images
- MySQL database for image hierarchies