I have multiple .tif files which range from 500MB - 5GB. I need to convert them to zarr arrays and preferably write them to my disk. I have an aws ec2 linux instance with 32GB RAM. I have searched alot online but haven't found anyway to do so. I was looking into a library called pyvips but am not able to use it to convert the image to a zarr array. I also thought file = tifffile.imread(path_to_tif, aszarr=True) would do the trick but that didn't work. Any help is appreciated!
$\begingroup$ $\endgroup$
3 - $\begingroup$ Maybe you can split the processing up into reading a fixed size buffer from input file (into a numpy array) and then do the second step of writing to the zarr arrays. zarr.readthedocs.io/en/stable/… $\endgroup$MPIchael– MPIchael2024-04-15 09:35:43 +00:00Commented Apr 15, 2024 at 9:35
- $\begingroup$ I'm sorry I don't quite understand what you're trying to say. If I read the image in 'buffers' say using pyvips, having to convert to numpy arrays is what takes up all the RAM..even if I do it in stages ultimately it adds up $\endgroup$PotterHead– PotterHead2024-04-15 09:59:26 +00:00Commented Apr 15, 2024 at 9:59
- $\begingroup$ How did these files come about to begin with? $\endgroup$Wolfgang Bangerth– Wolfgang Bangerth2024-04-16 00:17:19 +00:00Commented Apr 16, 2024 at 0:17
Add a comment |
1 Answer
$\begingroup$ $\endgroup$
I found a solution which I'm linking here in case! https://gist.github.com/GenevieveBuckley/d94351adcc61cb5237a6c0a540c14cf6