I am building a webapp using django and I deal with large excel files (about a million rows) that I parse into a hash for faster calculations and manipulations. I want to cache the hash, but the hash is 251mb in size and I don't think memcache allows you to cache such large variables. Does anyone have any suggestions as to how I should deal with this? I'm open to ways other than caching, too.
- Parsing an Excel file into a hash for faster calculations and manipulations? Could you tell a bit more about it?Elmex80s– Elmex80s2017-03-08 19:41:34 +00:00Commented Mar 8, 2017 at 19:41
- @Elmex80s I have a bunch of functions to be run on the data that require a fast look up to execute. I have company_id and parent_id columns and I might have to look for all company_ids with the same parent_id and see if certain columns in the all the companies match etc. and I would run such functions on every row in the excel file.Manan Mehta– Manan Mehta2017-03-08 19:55:41 +00:00Commented Mar 8, 2017 at 19:55
- 2Sounds interesting, but why not load the Excel file in memory with the xlrd module or the pandas package? The xlrd module will give you plain lists with values you can reshape yourself. The pandas package will give you DataFrames which is a very powerful data structure to deal with (very large) tables like yours.Elmex80s– Elmex80s2017-03-08 20:25:45 +00:00Commented Mar 8, 2017 at 20:25
- 1I do not know a magic Python module which does all the work for you. Therefore I think you have to program this by yourself. So you have to store temporary data on the hard drive or in memory and at every request check if the data is already there. Next to that you have to handle the limit of temporary data. I would say you store a max of say 1000 intermediate results. About the hash, if your 100 MB file gives you a + 250 MB file hash then you should change a few things.Elmex80s– Elmex80s2017-03-10 22:21:47 +00:00Commented Mar 10, 2017 at 22:21
- 1why is it that my hash is twice the size of xlsx file I do not know, sorry.Elmex80s– Elmex80s2017-03-10 23:09:06 +00:00Commented Mar 10, 2017 at 23:09
| Show 5 more comments
1 Answer
If you're not married to memcached, Redis has a maximum value size of 2GB per key. As long as you're using Django's built-in caching layer, then it's a drop-in replacement after setting up the Redis server and reconfiguring your caching settings.
See also: https://github.com/niwinz/django-redis