733

I have looked through the information that the Python documentation for pickle gives, but I'm still a little confused. What would be some sample code that would write a new file and then use pickle to dump a dictionary into it?

2

10 Answers 10

1379

Try this:

import pickle a = {'hello': 'world'} with open('filename.pickle', 'wb') as handle: pickle.dump(a, handle, protocol=pickle.HIGHEST_PROTOCOL) with open('filename.pickle', 'rb') as handle: b = pickle.load(handle) print(a == b) 

There's nothing about the above solution that is specific to a dict object. This same approach will will work for many Python objects, including instances of arbitrary classes and arbitrarily complex nestings of data structures. For example, replacing the second line with these lines:

import datetime today = datetime.datetime.now() a = [{'hello': 'world'}, 1, 2.3333, 4, True, "x", ("y", [[["z"], "y"], "x"]), {'today', today}] 

will produce a result of True as well.

Some objects can't be pickled due to their very nature. For example, it doesn't make sense to pickle a structure containing a handle to an open file.

Sign up to request clarification or add additional context in comments.

7 Comments

What does pickle.HIGHEST_PROTOCOL actually do?
@BallpointBen: It picks the highest protocol version your version of Python supports: docs.python.org/3/library/pickle.html#data-stream-format
To make it more concise you can write protocol=-1 (similar to -1 indexing in a list).
If you are saving/loading a large object, please do use pickle.HIGHEST_PROTOCOL. Otherwise you may waste a lot of time and disk space.
@nurettin why would one need to figure out which protocol was used? The documentation for pickle.load reads "The protocol version of the pickle is detected automatically, so no protocol argument is needed.". In fact, pickle.load does not even have the option to specify the protocol.
|
183

Use:

import pickle your_data = {'foo': 'bar'} # Store data (serialize) with open('filename.pickle', 'wb') as handle: pickle.dump(your_data, handle, protocol=pickle.HIGHEST_PROTOCOL) # Load data (deserialize) with open('filename.pickle', 'rb') as handle: unserialized_data = pickle.load(handle) print(your_data == unserialized_data) 

The advantage of HIGHEST_PROTOCOL is that files get smaller. This makes unpickling sometimes much faster.

Important notice: The answer was written in 2015 (Python 3.4!). Back then, the maximum file size of pickle was about 2 GB.

Alternative way

import mpu your_data = {'foo': 'bar'} mpu.io.write('filename.pickle', data) unserialized_data = mpu.io.read('filename.pickle') 

Alternative Formats

For your application, the following might be important:

  • Support by other programming languages
  • Reading / writing performance
  • Compactness (file size)

See also: Comparison of data serialization formats

In case you are rather looking for a way to make configuration files, you might want to read my short article Configuration files in Python

4 Comments

How did you determine the maximum limit? I was not aware of any limit and have pickled and unpickled 7GB in the past, without encountering anything suspicious.
I don't remember exactly as this is more than 8 years ago. I think I just ran into an error message
Ah, ok, so it is probably memory-related.
I just pickled and unpickled a 4.7 GB file. You might want to remove the point.
50

Save a dictionary into a pickle file.

import pickle favorite_color = {"lion": "yellow", "kitty": "red"} # create a dictionary pickle.dump(favorite_color, open("save.p", "wb")) # save it into a file named save.p # ------------------------------------------------------------- # Load the dictionary back from the pickle file. import pickle favorite_color = pickle.load(open("save.p", "rb")) # favorite_color is now {"lion": "yellow", "kitty": "red"} 

Comments

18

A simple way to dump Python data (e.g., a dictionary) to a pickle file:

import pickle your_dictionary = {} pickle.dump(your_dictionary, open('pickle_file_name.p', 'wb')) 

Comments

17

In general, pickling a dict will fail unless you have only simple objects in it, like strings and integers.

Python 2.7.9 (default, Dec 11 2014, 01:21:43) [GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> from numpy import * >>> type(globals()) <type 'dict'> >>> import pickle >>> pik = pickle.dumps(globals()) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1374, in dumps Pickler(file, protocol).dump(obj) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 224, in dump self.save(obj) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save f(self, obj) # Call unbound method with explicit self File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 649, in save_dict self._batch_setitems(obj.iteritems()) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 663, in _batch_setitems save(v) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 306, in save rv = reduce(self.proto) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/copy_reg.py", line 70, in _reduce_ex raise TypeError, "can't pickle %s objects" % base.__name__ TypeError: can't pickle module objects >>> 

Even a really simple dict will often fail. It just depends on the contents.

>>> d = {'x': lambda x:x} >>> pik = pickle.dumps(d) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1374, in dumps Pickler(file, protocol).dump(obj) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 224, in dump self.save(obj) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save f(self, obj) # Call unbound method with explicit self File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 649, in save_dict self._batch_setitems(obj.iteritems()) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 663, in _batch_setitems save(v) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save f(self, obj) # Call unbound method with explicit self File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 748, in save_global (obj, module, name)) pickle.PicklingError: Can't pickle <function <lambda> at 0x102178668>: it's not found as __main__.<lambda> 

However, if you use a better serializer like dill or cloudpickle, then most dictionaries can be pickled:

>>> import dill >>> pik = dill.dumps(d) 

Or if you want to save your dict to a file...

>>> with open('save.pik', 'w') as f: ... dill.dump(globals(), f) ... 

The latter example is identical to any of the other good answers posted here (which aside from neglecting the picklability of the contents of the dict are good).

Comments

11

Use:

>>> import pickle >>> with open("/tmp/picklefile", "wb") as f: ... pickle.dump({}, f) ... 

Normally it's preferable to use the cPickle implementation:

>>> import cPickle as pickle >>> help(pickle.dump) Help on built-in function dump in module cPickle: dump(...) dump(obj, file, protocol=0) -- Write an object in pickle format to the given file. See the Pickler docstring for the meaning of optional argument proto. 

Comments

9

If you just want to store the dict in a single file, use pickle like this:

import pickle a = {'hello': 'world'} with open('filename.pickle', 'wb') as handle: pickle.dump(a, handle) with open('filename.pickle', 'rb') as handle: b = pickle.load(handle) 

If you want to save and restore multiple dictionaries in multiple files for caching and store more complex data, use anycache. It does all the other stuff you need around pickle

from anycache import anycache @anycache(cachedir='path/to/files') def myfunc(hello): return {'hello', hello} 

Anycache stores the different myfunc results, depending on the arguments to different files in cachedir and reloads them.

See the documentation for any further details.

Comments

3

FYI, Pandas has a method to save pickles now.

I find it easier.

pd.to_pickle(object_to_save,'/temp/saved_pkl.pickle' ) 

Comments

2
import pickle dictobj = {'Jack' : 123, 'John' : 456} filename = "/foldername/filestore" fileobj = open(filename, 'wb') pickle.dump(dictobj, fileobj) fileobj.close() 

Comments

2

If you want to handle writing or reading in one line without file opening:

 import joblib my_dict = {'hello': 'world'} joblib.dump(my_dict, "my_dict.pickle") # write pickle file my_dict_loaded = joblib.load("my_dict.pickle") # read pickle file 

2 Comments

This is irrelevant, as OP did not ask about caching in this case.
Where is caching here? It is saving the dictionary content into a pickle file as asked in the question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.