joblib.dump() and joblib.load() provide a replacement for pickle to work efficiently on Python objects containing large data, in particular large numpy arrays.
First we create a temporary directory:
>>> from tempfile import mkdtemp
>>> savedir = mkdtemp()
>>> import os
>>> filename = os.path.join(savedir, 'test.pkl')
Then we create an object to be persisted:
>>> import numpy as np
>>> to_persist = [('a', [1, 2, 3]), ('b', np.arange(10))]
which we save into savedir:
>>> import joblib
>>> joblib.dump(to_persist, filename)
['...test.pkl', '...test.pkl_01.npy']
We can then load the object from the file:
>>> joblib.load(filename)
[('a', [1, 2, 3]), ('b', array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]))]
Note
As you can see from the output, joblib pickle tend to be spread across multiple files. More precisely, on top of the main joblib pickle file (passed into the joblib.dump function), for each numpy array that the persisted object contains, an auxiliary .npy file with the binary data of the array will be created. When moving joblib pickle files around, you will need to remember to keep all these files together.
Setting the compress argument to True in joblib.dump() will allow to save space on disk:
>>> joblib.dump(to_persist, filename, compress=True)
['...test.pkl']
Another advantage it that it will create a single-file joblib pickle.
More details can be found in the joblib.dump() and joblib.load() documentation.
Compatibility of joblib pickles across python versions is not supported. Note that this may appear to work when saving a pickle with python 2 and loading it with python 3, for a very restricted set of objects but relying on it is strongly discouraged.
If you are switching between python versions, you will need to save a different joblib pickle for each python version.
Here are a few examples or exceptions:
Saving joblib pickle with python 2, trying to load it with python 3:
Traceback (most recent call last): File "/home/lesteve/dev/joblib/joblib/numpy_pickle.py", line 453, in load obj = unpickler.load() File "/home/lesteve/miniconda3/lib/python3.4/pickle.py", line 1038, in load dispatch[key[0]](self) File "/home/lesteve/miniconda3/lib/python3.4/pickle.py", line 1176, in load_binstring self.append(self._decode_string(data)) File "/home/lesteve/miniconda3/lib/python3.4/pickle.py", line 1158, in _decode_string return value.decode(self.encoding, self.errors) UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 1024: ordinal not in range(128) Traceback (most recent call last): File "<string>", line 1, in <module> File "/home/lesteve/dev/joblib/joblib/numpy_pickle.py", line 462, in load raise new_exc ValueError: You may be trying to read with python 3 a joblib pickle generated with python 2. This is not feature supported by joblib.Saving joblib pickle with python 3, trying to load it with python 2:
Traceback (most recent call last): File "<string>", line 1, in <module> File "joblib/numpy_pickle.py", line 453, in load obj = unpickler.load() File "/home/lesteve/miniconda3/envs/py27/lib/python2.7/pickle.py", line 858, in load dispatch[key](self) File "/home/lesteve/miniconda3/envs/py27/lib/python2.7/pickle.py", line 886, in load_proto raise ValueError, "unsupported pickle protocol: %d" % proto ValueError: unsupported pickle protocol: 3