1

I want to serialize a trained scikit pipeline object to reload it for predictions. What I saw pickle and joblib dump are two common methods for that, whereas joblib is the preferable approach.

In my case I want to store the serialized python object in the database and load it from there, deserialize it and use it for predictions. Is it possible to serialize the object without any file system access?

2 Answers 2

2

Yes, for the pickle library you can get the serialized version of an object by using pickle.dumps instead of pickle.dump.

serialized_object = pickle.dumps(object) 

This returns a bytes object, which you should then be able to store in your database, potentially converting it to base64 before doing so, or maybe directly.

Sign up to request clarification or add additional context in comments.

2 Comments

thanks, does joblib has the same problems to be not compatible when loading a dump in a different python version as pickle has?
@HansHupe don't really know that, sorry
1

You can do this:

import joblib from io import BytesIO import base64 with BytesIO() as tmp_bytes: joblib.dump({"test": "test"}, tmp_bytes) bytes_obj = tmp_bytes.getvalue() base64_obj = base64.b64encode(bytes_obj) 

Then, bytes_obj is a bytes object. And base64_obj is the base64 version. Select what you like.

6 Comments

Nice, interesting, any advantages over pickle.dumps?
joblib is faster in storing numpy arrays
Regarding python version issues it's the same as pickle?
btw: how can i load it then from the dump?
They have the same problem in different python version.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.