Ignore Order¶
DeepDiff by default compares objects in the order that it iterates through them in iterables. In other words if you have 2 lists, then the first item of the lists are compared to each other, then the 2nd items and so on. That makes DeepDiff be able to run in linear time.
However, There are often times when you don’t care about the order in which the items have appeared. In such cases DeepDiff needs to do way more work in order to find the differences.
There are a couple of parameters provided to you to have full control over.
- List difference with ignore_order=False which is the default:
>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}} >>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 3, 2, 3]}} >>> ddiff = DeepDiff(t1, t2) >>> pprint (ddiff, indent = 2) { 'iterable_item_added': {"root[4]['b'][3]": 3}, 'values_changed': { "root[4]['b'][1]": {'new_value': 3, 'old_value': 2}, "root[4]['b'][2]": {'new_value': 2, 'old_value': 3}}}
Ignore Order¶
- List difference ignoring order or duplicates: (with the same dictionaries as above)
>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}} >>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 3, 2, 3]}} >>> ddiff = DeepDiff(t1, t2, ignore_order=True) >>> print (ddiff) {}
Reporting Repetitions¶
- List difference ignoring order and reporting repetitions:
>>> from deepdiff import DeepDiff >>> from pprint import pprint >>> t1 = [1, 3, 1, 4] >>> t2 = [4, 4, 1] >>> ddiff = DeepDiff(t1, t2, ignore_order=True, report_repetition=True) >>> pprint(ddiff, indent=2) { 'iterable_item_removed': {'root[1]': 3}, 'repetition_change': { 'root[0]': { 'new_indexes': [2], 'new_repeat': 1, 'old_indexes': [0, 2], 'old_repeat': 2, 'value': 1}, 'root[3]': { 'new_indexes': [0, 1], 'new_repeat': 2, 'old_indexes': [3], 'old_repeat': 1, 'value': 4}}}
Max Passes¶
- max_passes: Integer, default = 10000000
Maximum number of passes to run on objects to pin point what exactly is different. This is only used when ignore_order=True
If you have deeply nested objects, DeepDiff needs to run multiple passes in order to pin point the difference. That can dramatically increase the time spent to find the difference. You can control the maximum number of passes that can be run via the max_passes parameter.
Note
The definition of pass is whenever 2 iterable objects are being compared with each other and deepdiff decides to compare every single element of one iterable with every single element of the other iterable. Refer to Cutoff Distance For Pairs and Cutoff Intersection For Pairs for more info on how DeepDiff decides to start a new pass.
- Max Passes Example
>>> from pprint import pprint >>> from deepdiff import DeepDiff >>> >>> t1 = [ ... { ... 'key3': [[[[[1, 2, 4, 5]]]]], ... 'key4': [7, 8], ... }, ... { ... 'key5': 'val5', ... 'key6': 'val6', ... }, ... ] >>> >>> t2 = [ ... { ... 'key5': 'CHANGE', ... 'key6': 'val6', ... }, ... { ... 'key3': [[[[[1, 3, 5, 4]]]]], ... 'key4': [7, 8], ... }, ... ] >>> >>> for max_passes in (1, 2, 62, 65): ... diff = DeepDiff(t1, t2, ignore_order=True, max_passes=max_passes, verbose_level=2) ... print('-\n----- Max Passes = {} -----'.format(max_passes)) ... pprint(diff) ... DeepDiff has reached the max number of passes of 1. You can possibly get more accurate results by increasing the max_passes parameter. - ----- Max Passes = 1 ----- {'values_changed': {'root[0]': {'new_value': {'key5': 'CHANGE', 'key6': 'val6'}, 'old_value': {'key3': [[[[[1, 2, 4, 5]]]]], 'key4': [7, 8]}}, 'root[1]': {'new_value': {'key3': [[[[[1, 3, 5, 4]]]]], 'key4': [7, 8]}, 'old_value': {'key5': 'val5', 'key6': 'val6'}}}} DeepDiff has reached the max number of passes of 2. You can possibly get more accurate results by increasing the max_passes parameter. - ----- Max Passes = 2 ----- {'values_changed': {"root[0]['key3'][0]": {'new_value': [[[[1, 3, 5, 4]]]], 'old_value': [[[[1, 2, 4, 5]]]]}, "root[1]['key5']": {'new_value': 'CHANGE', 'old_value': 'val5'}}} DeepDiff has reached the max number of passes of 62. You can possibly get more accurate results by increasing the max_passes parameter. - ----- Max Passes = 62 ----- {'values_changed': {"root[0]['key3'][0][0][0][0]": {'new_value': [1, 3, 5, 4], 'old_value': [1, 2, 4, 5]}, "root[1]['key5']": {'new_value': 'CHANGE', 'old_value': 'val5'}}} DeepDiff has reached the max number of passes of 65. You can possibly get more accurate results by increasing the max_passes parameter. - ----- Max Passes = 65 ----- {'values_changed': {"root[0]['key3'][0][0][0][0][1]": {'new_value': 3, 'old_value': 2}, "root[1]['key5']": {'new_value': 'CHANGE', 'old_value': 'val5'}}}
Note
If there are potential passes left to be run and the max_passes value is reached, DeepDiff will issue a warning. However the most accurate result might have already been found when there are still potential passes left to be run.
For example in the above example at max_passes=64, DeepDiff finds the optimal result however it has one more pass to go before it has run all the potential passes. Hence just for the sake of example we are using max_passes=65 as an example of a number that doesn’t issue warnings.
Note
Also take a look at Max Passes
Cutoff Distance For Pairs¶
- cutoff_distance_for_pairs1 >= float >= 0, default=0.3
What is the threshold to consider 2 items as potential pairs. Note that it is only used when ignore_order = True.
cutoff_distance_for_pairs in combination with Cutoff Intersection For Pairs are the parameters that decide whether 2 objects to be paired with each other during ignore_order=True algorithm or not. Note that these parameters are mainly used for nested iterables.
For example by going from the default of cutoff_distance_for_pairs=0.3 to 0.1, we have essentially disallowed the 1.0 and 20.0 to be paired with each other. As you can see, DeepDiff has decided that the Deep Distance of 1.0 and 20.0 to be around 0.27. Since that is way above cutoff_distance_for_pairs of 0.1, the 2 items are not paired. As a result the lists containing the 2 numbers are directly compared with each other:
>>> from deepdiff import DeepDiff
>>> t1 = [[1.0]]
>>> t2 = [[20.0]]
>>> DeepDiff(t1, t2, ignore_order=True, cutoff_distance_for_pairs=0.3)
{'values_changed': {'root[0][0]': {'new_value': 20.0, 'old_value': 1.0}}}
>>> DeepDiff(t1, t2, ignore_order=True, cutoff_distance_for_pairs=0.1)
{'values_changed': {'root[0]': {'new_value': [20.0], 'old_value': [1.0]}}}
>>> DeepDiff(1.0, 20.0, get_deep_distance=True)
{'values_changed': {'root': {'new_value': 20.0, 'old_value': 1.0}}, 'deep_distance': 0.2714285714285714}
Cutoff Intersection For Pairs¶
- cutoff_intersection_for_pairs1 >= float >= 0, default=0.7
What is the threshold to calculate pairs of items between 2 iterables. For example 2 iterables that have nothing in common, do not need their pairs to be calculated. Note that it is only used when ignore_order = True.
Behind the scene DeepDiff takes the Deep Distance of objects when running ignore_order=True. The distance is between zero and 1. A distance of zero means the items are equal. A distance of 1 means they are 100% different. When comparing iterables, the cutoff_intersection_for_pairs is used to decide whether to compare every single item in each iterable with every single item in the other iterable or not. If the distance between the 2 iterables is equal or bigger than the cutoff_intersection_for_pairs, then the 2 iterables items are only compared as added or removed items and NOT modified items. However, if the distance between 2 iterables is below the cutoff, every single item from each iterable will be compared to every single item from the other iterable to find the closest “pair” of each item.
Note
The process of comparing every item to the other is very expensive so Cutoff Intersection For Pairs in combination with Cutoff Distance For Pairs is used to give acceptable results with much higher speed.
With a low cutoff_intersection_for_pairs, the 2 iterables above will be considered too far off from each other to get the individual pairs of items. So numbers that are not only related to each other via their positions in the lists and not their values are paired together in the results.
>>> t1 = [1.0, 2.0, 3.0, 4.0, 5.0]
>>> t2 = [5.0, 3.01, 1.2, 2.01, 4.0]
>>>
>>> DeepDiff(t1, t2, ignore_order=True, cutoff_intersection_for_pairs=0.1)
{'values_changed': {'root[1]': {'new_value': 3.01, 'old_value': 2.0}, 'root[2]': {'new_value': 1.2, 'old_value': 3.0}}, 'iterable_item_added': {'root[3]': 2.01}, 'iterable_item_removed': {'root[0]': 1.0}}
With the cutoff_intersection_for_pairs of 0.7 (which is the default value), the 2 iterables will be considered close enough to get pairs of items between the 2. So 2.0 and 2.01 are paired together for example.
>>> t1 = [1.0, 2.0, 3.0, 4.0, 5.0]
>>> t2 = [5.0, 3.01, 1.2, 2.01, 4.0]
>>>
>>> DeepDiff(t1, t2, ignore_order=True, cutoff_intersection_for_pairs=0.7)
{'values_changed': {'root[2]': {'new_value': 3.01, 'old_value': 3.0}, 'root[0]': {'new_value': 1.2, 'old_value': 1.0}, 'root[1]': {'new_value': 2.01, 'old_value': 2.0}}}
As an example of how much this parameter can affect the results in deeply nested objects, please take a look at Distance And Diff Granularity.
Back to DeepDiff 5.0.0 documentation!