Diff It To Digg It
Anybody who has used git diff will know that your life is not the same once you start diffing. When you get to the habit, there is no going back. Now let’s look at diff for structured data!
Read MorePublished on May 19, 2017 by Sep Dehpour
Update May 22: Here is the video of the talk:
This was my talk proposal for Pycon 2017 which got accepted. The proposal is slightly modified to match the final talk better. For example originally I was going to talk about writing a Redis Client too but I ended up removing that from the final talk. I will be giving this talk tomorrow on Saturday May 20th!
The code samples used in this talk can be found at: https://github.com/seperman/bad-ideas
Or simply: pip install bad-ideas
This talk is lightly inspired by my other talk: Harness the power of Python magic methods and lazy objects.
Magic methods are a very powerful feature of Python and can open a whole new door for you. However, with great power comes great responsibility.
In this talk we explore magic method’s capabilities by first experimenting with recreating echo
, grep
, and pipe
bash command syntaxes as valid Python syntaxes. And finally we learn about reference counting and the garbage collector by creating undeletable objects.
Once you see what magic methods can bring to the table; the limit is only your imagination!
This talk is mainly geared towards novice Python developers and is about Python’s magic methods. However, the ideas brought up in the experiments can be very interesting for more experienced Python developers as well. The audience is expected to have limited or even no exposure to the magic methods.
The talk is expected to make the audience excited about what magic methods can bring to the table and demystify certain syntaxes that they might have seen and used in certain libraries, for example the Django queries or SQLAlchemy queries from chaining filters and operators.
__fairest__
one of all? 1.5 min
What are Python’s magic methods?
__something__
.Example: __init__
So what can we do with them?
Let’s do some experiments and learn by examples!
Magic methods are a very powerful feature of Python and can open a whole new door for you. However with great power comes great responsibility.
The following experiments are solely for educational purposes and NOT for production code.
4.5 min
I don’t know about you but I don’t like typing too much. Every keystroke is a stress on your fingers and over time it adds up.
There are times that you need to type something like:
a = 10
a = a + 20
or as you know the shortcut that is:
a += 20
Since we are all about typing less. What if we just do:
a+20
and that does the job for us? We save one keystroke of =
.
>>> a = 10
>>> a + 20
>>> print(a)
30
Let’s look at the full list of magic methods.
__add__
and __sub__
We can do that using __add__
and __sub__
magic methods.
class Num:
def __init__(self, value):
self.value = value
def __add__(self, other):
self.value += other
return self.value
def __sub__(self, other):
self.value -= other
return self.value
def __repr__(self):
return str(self.value)
__str__ = __repr__
>>> a = Num(10)
>>> a
10
>>> a + 20
>>> a
30
>>> a - 5
>>> a
25
Yay, we removed the need to type =
which saves a couple million keystrokes a year.
What do you think is gonna happen if we do:
20 + a
Oops! We get:
TypeError: unsupported operand type(s) for +: 'int' and 'Num'
Let’s look at __add__
again:
def __add__(self, other):
self.value += other
return self.value
We are adding the “other” to the current value.
When we run: a + 20
it runs a.__add__(a, 20)
however when we do 20 + a
it runs int.__add__(20, a)
and then it freaks out!
__radd__
and __rsub__
That’s where the reversed add and sub come to play.
What happens is that when it runs int.__add__(20, a)
and gets a TypeError, then it tries the reverse add which is a.__radd__(a, 20)
.
def __rsub__(self, other):
self.value = other - self.value
return self.value
__radd__ = __add__
Here is the full implementation:
class Num:
def __init__(self, value):
self.value = value
def __add__(self, other):
self.value += other
return self.value
def __sub__(self, other):
self.value -= other
return self.value
def __rsub__(self, other):
self.value = other - self.value
return self.value
def __repr__(self):
return str(self.value)
__str__ = __repr__
__radd__ = __add__
>>> a = Num(10)
>>> a + 20
>>> a
30
>>> a - 5
>>> a
25
>>> 40 + a
>>> a
65
>>> 20 - a
>>> a
-45
Disclaimer: That’s a bad idea. Don’t try this at home. I mean at work.
2 min
Filter is a built-in function that does filtering on iterables.
Here is an example in Python 3:
foo = [1, 2, 3, 5, 6, 7]
bar = filter(lambda x: x % 3 == 0, foo)
We are filtering the list to have only elements that are divisible by 3.
If you print bar, what do you think you are gonna get?
>>> print(bar)
<filter at 0x119151d18>
That’s right, bar is a generator (in Python3).
How do you get the filtered list printed?
One way is to convert the generator to a list:
>>> print(list(bar))
[3, 6]
But sometimes that is too much work if you want to keep printing the object and you know you don’t want it as a generator when printing.
How can we modify the built-in filter so it converts itself into a list when printed?
Did you know that everything is an object in Python?
def func(x):
previous_x = getattr(func, "_x", "Not set")
print("new value: {}, previous value: {}".format(x, previous_x))
func._x = x
>>> func(10)
new value: 10, previous value: Not set
>>> func(20)
new value: 20, previous value: 10
>>> func(30)
new value: 30, previous value: 20
Even built-in functions are object! Yes, even built-in functions. Let’s subclass the filter builtin function and add some __str__
and __repr__
to filter:
class Filter(filter):
def __str__(self):
return str(list(self))
__repr__ = __str__
bar = Filter(lambda x: x % 3 == 0, foo)
print(bar) # prints [3, 6]
Now you can use Filter
instead of filter
and printing will give you the filtered results. No worries!
1.5 min
Here is one of my favorites in bash:
echo "hello" >> foo.txt
And sometimes that syntax is too good not to use in Python.
The trick is that Python 2 used to have something like this:
print >> myfile, "Hello World!\n"
print >> myfile, "I want a burrito."
But you can’t do that in Py 3 anymore since print
is a function now.
Hmm, what have we got for >>
operator?
operator | method |
---|---|
» | Binary operation of __rshift__ |
Awesome! Lets get to work.
myfile = open("hello.txt", "w")
class Echo:
def __init__(self, text):
self.text = text
def __rshift__(self, other):
other.seek(0, 2)
other.write(self.text)
# Writes to the end of the file!
>>> Echo("Hello World!\n") >> myfile
>>> Echo("I want a burrito.") >> myfile
3.5 min
here is another favorite from bash: pipe and grep. I use it all the time.
command | grep something
First of all, what can we use for pipe |
operator?
Let’s see.
operator | method |
---|---|
` | ` |
aha!
Let’s say we define a grep that uses a |
(binary or) operator. This is one way we can define it:
class Grep:
def __or__(self, other):
...
Note the self and other in the arguments. It means that in order to use |
with this grep, it first needs to __init__
the grep and then do the pipe. Which means the order we write things are gonna be different than what we are used to see in bash:
instead of text | grep something
, it is gonna be grep(something) | text
. But wait a second. There is reverse or: __ror__
too. That can let us write text | grep(seomthing)
!
class Grep:
def __init__(self, item):
self.item = item.lower()
def thefilter(self, line):
return self.item in line
def __ror__(self, other):
if isinstance(other, str):
other = other.lower().split('\n')
return Filter(self.thefilter, other)
lines = """
Whether you're new to programming or
an experienced developer, it's easy
to learn and use Python.
Checkout jobs.python.org
for Python jobs.
"""
>>> lines | Grep('Python')
['to learn and use Python.',
'checkout jobs.python.org',
'for python jobs.']
Awesome! You can even even chain the greps!
>>> found = lines | Grep('Python') | Grep('jobs')
>>> print(found)
['checkout jobs.python.org', 'for python jobs.']
5.5
Let’s say you run del obj
. Normally that would delete the object but we want to make it undeletable!
>>> del obj
<obj: I'm still here. You CAN NOT delete me!>
>>> obj
<Yes I'm still here!>
Did you know what happens when you delete an object?
del obj
Hmm, ok let’s review how deleting works in Python.
cPython specifically keeps track of number references to the object. This is called reference counting. When you do del obj
, it sets the number of references to the object to zero.
Then the Garbage collector goes and deallocates the object. However if your object has a finalizer, then things can get tricky.
You might ask what is a finalizer? objects with finalizers are objects with a __del__
method and generators with a finally block.
And if your object has a finalizer, the garbage collector will run the finalizer only at that moment.
The important thing to keep in mind is that it is not guaranteed that __del__
will run immediately after you run del
since it is up to the garbage collector to run the __del__
. The __del__
might never run. So you can’t ever depend on it. But for the sake of this experiment, we will use __del__
into our advantage.
Again, when the __del__
is run, the reference count to the object has already been set to zero and it is literally removed from the name space that it existed before.
So how can we resurrect the object once the __del__
is running? Maybe we can raise some exception in __del__
so it can’t be successfully run and the GC aborts deleting it?
The answer is no. cPython will abort running __del__
but it will still deallocate the object.
Here is another idea: we still have access to self inside __del__
. After all it is
def __del__(self):
...
How can we use this to our advantage? Maybe we put the object back in the name-space it was deleted from?
class Obj:
def __del__(self):
global obj
obj = self
print("You can't delete me!")
def __str__(self):
return "<obj:{}>".format(id(self))
>>> obj = Obj()
>>> print(obj)
<obj:123123>
>>> del obj
You can't delete me!
>>> print(obj)
<obj:123123>
Del didn’t delete the object! It is the same object with the same id!
So what happened here again?
del obj
sets the number of references to obj to zero and removes it from the globals name space in this case.__del__
finalizer method and runs it.obj
. There are ways to find the name but that would have made the code way longer. You can see the full version implemented here.Pep 442 was introduced in Python 3.4 and made some backward incompatible changes into how the finalizer is called by the garbage collector.
We were running the above code:
print(obj)
del obj
print(obj)
And in Python 3.4+ what we get is something like:
<obj:4482786192>
You can't delete me!
<obj:4482786192>
But in Python 2 to 3.3 you get:
<obj:4549435760>
You can't delete me!
<obj:4549435760>
You can't delete me!
You can't delete me!
Basically in Python 3.4+ __del__
methods will be executed at most once by the garbage collector and it will no longer matter whether an object with a finalizer is a part of cyclic trash.
Now that you made it to here, there are a couple of articles I would recommend you to take a look at the following pages too to learn about the garbage collector and even other implementations of it:
1 min
We explored magic method’s capabilities by first experimenting with
As we saw, the magic methods can bring a lot to the table; the limit is only your imagination!
Hope you learnt something from these bad ideas! https://github.com/seperman/bad-ideas
Don’t forget to pip install bad-ideas
and play around with the code!
Anybody who has used git diff will know that your life is not the same once you start diffing. When you get to the habit, there is no going back. Now let’s look at diff for structured data!
Read MoreDimistify lazy objects in Python, mainly focusing on lazy loading aspect but not lazy writing. However in order to do lazy loading, we need to have a good understanding of some of the Python’s magic methods.
Read MoreIntroducing RedisWorks. How to get more from Redis with less coding.
Read More