Delta¶
DeepDiff Delta is a directed delta that when applied to t1 can yield t2 where delta is the difference between t1 and t2. Delta objects are like git commits but for structured data. You can convert the diff results into Delta objects, store the deltas, and later apply to other objects.
Note
If you plan to generate Delta objects from the DeepDiff result, and ignore_order=True, you need to also set the report_repetition=True.
Parameters
- diffDelta dictionary, Delta dump payload or a DeepDiff object, default=None.
Diff to load in Delta is the content to be loaded.
- delta_pathString, default=None.
Delta Path parameter is the local path to the delta dump file to be loaded
- delta_fileFile Object, default=None.
Delta File parameter is the file object containing the delta data.
Note
You need to pass only one of the diff, delta_path, or delta_file parameters.
- deserializerDeserializer function, default=pickle_load
Delta Deserializer is the function to deserialize the delta content. The default is the pickle_load function that comes with DeepDiff.
- serializerSerializer function, default=pickle_dump
Delta Serializer is the function to serialize the delta content into a format that can be stored. The default is the pickle_dump function that comes with DeepDiff.
- log_errorsBoolean, default=True
Whether to log the errors or not when applying the delta object.
- raise_errorsBoolean, default=False
Delta Raise Errors parameter Whether to raise errors or not when applying a delta object.
- mutateBoolean, default=False.
Delta Mutate parameter defines whether to mutate the original object when adding the delta to it or not. Note that this parameter is not always successful in mutating. For example if your original object is an immutable type such as a frozenset or a tuple, mutation will not succeed. Hence it is recommended to keep this parameter as the default value of False unless you are sure that you do not have immutable objects. There is a small overhead of doing deepcopy on the original object when mutate=False. If performance is a concern and modifying the original object is not a big deal, set the mutate=True but always reassign the output back to the original object.
- safe_to_importSet, default=None.
Delta Safe To Import parameter is a set of modules that needs to be explicitly white listed to be loaded Example: {‘mymodule.MyClass’, ‘decimal.Decimal’} Note that this set will be added to the basic set of modules that are already white listed. The set of what is already white listed can be found in deepdiff.serialization.SAFE_TO_IMPORT
- verify_symmetryBoolean, default=False
Delta Verify Symmetry parameter is used to verify that the original value of items are the same as when the delta was created. Note that in order for this option to work, the delta object will need to store more data and thus the size of the object will increase. Let’s say that the diff object says root[0] changed value from X to Y. If you create the delta with the default value of verify_symmetry=False, then what delta will store is root[0] = Y. And if this delta was applied to an object that has any root[0] value, it will still set the root[0] to Y. However if verify_symmetry=True, then the delta object will store also that the original value of root[0] was X and if you try to apply the delta to an object that has root[0] of any value other than X, it will notify you.
Returns
A delta object that can be added to t1 to recreate t2.
Diff to load in Delta¶
- diffDelta dictionary, Delta dump payload or a DeepDiff object, default=None.
diff is the content to be loaded.
>>> from deepdiff import DeepDiff, Delta
>>> from pprint import pprint
>>>
>>> t1 = [1, 2, 3]
>>> t2 = ['a', 2, 3, 4]
>>> diff = DeepDiff(t1, t2)
>>> diff
{'type_changes': {'root[0]': {'old_type': <class 'int'>, 'new_type': <class 'str'>, 'old_value': 1, 'new_value': 'a'}}, 'iterable_item_added': {'root[3]': 4}}
>>> delta = Delta(diff)
>>> delta
<Delta: {'type_changes': {'root[0]': {'old_type': <class 'int'>, 'new_type': <class 'str'>, 'new_value': ...}>
Applying the delta object to t1 will yield t2:
>>> t1 + delta
['a', 2, 3, 4]
>>> t1 + delta == t2
True
Now let’s dump the delta object so we can store it.
>>> dump = delta.dumps()
>>>
>>> dump
b'\x80\x04\x95\x8d\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x0ctype_changes\x94}\x94\x8c\x07root[0]\x94}\x94(\x8c\x08old_type\x94\x8c\x08builtins\x94\x8c\x03int\x94\x93\x94\x8c\x08new_type\x94h\x06\x8c\x03str\x94\x93\x94\x8c\tnew_value\x94\x8c\x01a\x94us\x8c\x13iterable_item_added\x94}\x94\x8c\x07root[3]\x94K\x04su.'
The dumps() function gives us the serialized content of the delta in the form of bytes. We could store it however we want. Or we could use the dump(file_object) to write the dump to the file_object instead. But before we try the dump(file_object) method, let’s create a new Delta object and reapply it to t1 and see if we still get t2:
>>> delta2 = Delta(dump)
>>> t1 + delta2 == t2
True
>>>
Delta Path parameter¶
Ok now we can try the dumps(file_object). It does what you expect:
>>> with open('/tmp/delta1', 'wb') as dump_file:
... delta.dump(dump_file)
...
And we use the delta_path parameter to load the delta
>>> delta3 = Delta(delta_path='/tmp/delta1')
It still gives us the same result when applied.
>>> t1 + delta3 == t2
True
Delta File parameter¶
You can also pass a file object containing the delta dump:
>>> with open('/tmp/delta1', 'rb') as dump_file:
... delta4 = Delta(delta_file=dump_file)
...
>>> t1 + delta4 == t2
True
Delta Deserializer¶
DeepDiff by default uses a restricted Python pickle function to deserialize the Delta dumps. Read more about Delta Dump Safety.
The user of Delta can decide to switch the serializer and deserializer to their custom ones. The serializer and deserializer parameters can be used exactly for that reason. The best way to come up with your own serializer and deserialier is to take a look at the pickle_dump and pickle_load functions in the serializer module
Json Deserializer for Delta¶
If all you deal with are Json serializable objects, you can use json for serialization.
>>> from deepdiff import DeepDiff, Delta
>>> import json
>>> t1 = {"a": 1}
>>> t2 = {"a": 2}
>>>
>>> diff = DeepDiff(t1, t2)
>>> delta = Delta(diff, serializer=json.dumps)
>>> dump = delta.dumps()
>>> dump
'{"values_changed": {"root[\'a\']": {"new_value": 2}}}'
>>> delta_reloaded = Delta(dump, deserializer=json.loads)
>>> t2 == delta_reloaded + t1
True
Note
Json is very limited and easily you can get to deltas that are not json serializable. You will probably want to extend the Python’s Json serializer to support your needs.
>>> t1 = {"a": 1}
>>> t2 = {"a": None}
>>> diff = DeepDiff(t1, t2)
>>> diff
{'type_changes': {"root['a']": {'old_type': <class 'int'>, 'new_type': <class 'NoneType'>, 'old_value': 1, 'new_value': None}}}
>>> Delta(diff, serializer=json.dumps)
<Delta: {'type_changes': {"root['a']": {'old_type': <class 'int'>, 'new_type': <class 'NoneType'>, 'new_v...}>
>>> delta = Delta(diff, serializer=json.dumps)
>>> dump = delta.dumps()
Traceback (most recent call last):
File "lib/python3.8/json/encoder.py", line 179, in default
raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type type is not JSON serializable
Delta Serializer¶
DeepDiff uses pickle to serialize delta objects by default. Please take a look at the Delta Deserializer for more information.
Delta Dump Safety¶
Delta by default uses Python’s pickle to serialize and deserialize. While the unrestricted use of pickle is not safe as noted in the pickle’s documentation , DeepDiff’s Delta is written with extra care to restrict the globals and hence mitigate this security risk.
In fact only a few Python object types are allowed by default. The user of DeepDiff can pass additional types using the Delta Safe To Import parameter to allow further object types that need to be allowed.
Delta Mutate parameter¶
- mutateBoolean, default=False.
delta_mutate defines whether to mutate the original object when adding the delta to it or not. Note that this parameter is not always successful in mutating. For example if your original object is an immutable type such as a frozenset or a tuple, mutation will not succeed. Hence it is recommended to keep this parameter as the default value of False unless you are sure that you do not have immutable objects. There is a small overhead of doing deepcopy on the original object when mutate=False. If performance is a concern and modifying the original object is not a big deal, set the mutate=True but always reassign the output back to the original object.
For example:
>>> t1 = [1, 2, [3, 5, 6]]
>>> t2 = [2, 3, [3, 6, 8]]
>>> diff = DeepDiff(t1, t2, ignore_order=True, report_repetition=True)
>>> diff
{'values_changed': {'root[0]': {'new_value': 3, 'old_value': 1}, 'root[2][1]': {'new_value': 8, 'old_value': 5}}}
>>> delta = Delta(diff)
>>> delta
<Delta: {'values_changed': {'root[0]': {'new_value': 3}, 'root[2][1]': {'new_value': 8}}}>
Note that we can apply delta to objects different than the original objects they were made from:
>>> t3 = ["a", 2, [3, "b", "c"]]
>>> t3 + delta
[3, 2, [3, 8, 'c']]
If we check t3, it is still the same as the original value of t3:
>>> t3
['a', 2, [3, 'b', 'c']]
Now let’s make the delta with mutate=True
>>> delta2 = Delta(diff, mutate=True)
>>> t3 + delta2
[3, 2, [3, 8, 'c']]
>>> t3
[3, 2, [3, 8, 'c']]
Applying the delta to t3 mutated the t3 itself in this case!
Delta and Numpy¶
>>> from deepdiff import DeepDiff, Delta
>>> import numpy as np
>>> t1 = np.array([1, 2, 3, 5])
>>> t2 = np.array([2, 2, 7, 5])
>>> diff = DeepDiff(t1, t2)
>>> diff
{'values_changed': {'root[0]': {'new_value': 2, 'old_value': 1}, 'root[2]': {'new_value': 7, 'old_value': 3}}}
>>> delta = Delta(diff)
Note
When applying delta to Numpy arrays, make sure to put the delta object first and the numpy array second. This is because Numpy array overrides the + operator and thus DeepDiff’s Delta won’t be able to be applied.
>>> t1 + delta
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
raise DeltaNumpyOperatorOverrideError(DELTA_NUMPY_OPERATOR_OVERRIDE_MSG)
deepdiff.delta.DeltaNumpyOperatorOverrideError: A numpy ndarray is most likely being added to a delta. Due to Numpy override the + operator, you can only do: delta + ndarray and NOT ndarray + delta
Let’s put the delta first then:
>>> delta + t1
array([2, 2, 7, 5])
>>> delta + t2 == t2
array([ True, True, True, True])
Note
You can apply a delta that was created from normal Python objects to Numpy arrays. But it is not recommended.
Delta Raise Errors parameter¶
- raise_errorsBoolean, default=False
Whether to raise errors or not when applying a delta object.
>>> from deepdiff import DeepDiff, Delta
>>> t1 = [1, 2, [3, 5, 6]]
>>> t2 = [2, 3, [3, 6, 8]]
>>> diff = DeepDiff(t1, t2, ignore_order=True, report_repetition=True)
>>> delta = Delta(diff, raise_errors=False)
Now let’s apply the delta to a very different object:
>>> t3 = [1, 2, 3, 5]
>>> t4 = t3 + delta
Unable to get the item at root[2][1]
We get the above log message that it was unable to get the item at root[2][1]. We get the message since by default log_errors=True
Let’s see what t4 is now:
>>> t4
[3, 2, 3, 5]
So the delta was partially applied on t3.
Now let’s set the raise_errors=True
>>> delta2 = Delta(diff, raise_errors=True)
>>>
>>> t3 + delta2
Unable to get the item at root[2][1]
Traceback (most recent call last):
current_old_value = obj[elem]
TypeError: 'int' object is not subscriptable
During handling of the above exception, another exception occurred:
deepdiff.delta.DeltaError: Unable to get the item at root[2][1]
Delta Safe To Import parameter¶
- safe_to_importSet, default=None.
safe_to_import is a set of modules that needs to be explicitly white listed to be loaded Example: {‘mymodule.MyClass’, ‘decimal.Decimal’} Note that this set will be added to the basic set of modules that are already white listed.
As noted in Delta Dump Safety and Delta Deserializer, DeepDiff’s Delta takes safety very seriously and thus limits the globals that can be deserialized when importing. However on occasions that you need a specific type (class) that needs to be used in delta objects, you need to pass it to the Delta via safe_to_import parameter.
The set of what is already white listed can be found in deepdiff.serialization.SAFE_TO_IMPORT At the time of writing this document, this list consists of:
>>> from deepdiff.serialization import SAFE_TO_IMPORT
>>> from pprint import pprint
>>> pprint(SAFE_TO_IMPORT)
{'builtins.None',
'builtins.bin',
'builtins.bool',
'builtins.bytes',
'builtins.complex',
'builtins.dict',
'builtins.float',
'builtins.frozenset',
'builtins.int',
'builtins.list',
'builtins.range',
'builtins.set',
'builtins.slice',
'builtins.str',
'builtins.tuple',
'collections.namedtuple',
'datetime.datetime',
'datetime.time',
'datetime.timedelta',
'decimal.Decimal',
'ordered_set.OrderedSet',
're.Pattern'}
If you want to pass any other argument to safe_to_import, you will need to put the full path to the type as it appears in the sys.modules
For example let’s say you have a package call mypackage and has a module called mymodule. If you check the sys.modules, the address to this module must be mypackage.mymodule. In order for Delta to be able to serialize this object via pickle, first of all it has to be picklable.
>>> diff = DeepDiff(t1, t2)
>>> delta = Delta(diff)
>>> dump = delta.dumps()
The dump at this point is serialized via Pickle and can be written to disc if needed.
Later when you want to load this dump, by default Delta will block you from importing anything that is NOT in deepdiff.serialization.SAFE_TO_IMPORT . In fact it will show you this error message when trying to load this dump:
deepdiff.serialization.ForbiddenModule: Module ‘builtins.type’ is forbidden. You need to explicitly pass it by passing a safe_to_import parameter
In order to let Delta know that this specific module is safe to import, you will need to pass it to Delta during loading of this dump:
>>> delta = Delta(dump, safe_to_import={'mypackage.mymodule'})
Note
If you pass a custom deserializer to Delta, DeepDiff will pass safe_to_import parameter to the custom deserializer if that deserializer takes safe_to_import as a parameter in its definition. For example if you just use json.loads as deserializer, the safe_to_import items won’t be passed to it since json.loads does not have such a parameter.
Delta Verify Symmetry parameter¶
- verify_symmetryBoolean, default=False
verify_symmetry is used to verify that the original value of items are the same as when the delta was created. Note that in order for this option to work, the delta object will need to store more data and thus the size of the object will increase. Let’s say that the diff object says root[0] changed value from X to Y. If you create the delta with the default value of verify_symmetry=False, then what delta will store is root[0] = Y. And if this delta was applied to an object that has any root[0] value, it will still set the root[0] to Y. However if verify_symmetry=True, then the delta object will store also that the original value of root[0] was X and if you try to apply the delta to an object that has root[0] of any value other than X, it will notify you.
>>> from deepdiff import DeepDiff, Delta
>>> t1 = [1]
>>> t2 = [2]
>>> t3 = [3]
>>>
>>> diff = DeepDiff(t1, t2)
>>>
>>> delta2 = Delta(diff, raise_errors=False, verify_symmetry=True)
>>> t4 = delta2 + t3
Expected the old value for root[0] to be 1 but it is 3. Error found on: while checking the symmetry of the delta. You have applied the delta to an object that has different values than the original object the delta was made from
>>> t4
[2]
And if you had set raise_errors=True, then it would have raised the error in addition to logging it.