Harness the power of Python magic methods and lazy objects.

illustrations illustrations illustrations illustrations illustrations illustrations

Harness the power of Python magic methods and lazy objects.

Published on Sep 10, 2016 by Sep Dehpour

Table Of Contents

This is based on a talk I gave at SoCal Python meetup. The PDF of this talk is available here.

All the code is run in Python 3.5 but it works with Python 2 too.

Overview 

When it comes to performance, lazy loading can help a lot. I remember when I first started using Django and Django’s lazy queries were magic to me.

In this article I will try to dimistify lazy objects in Python, mainly focusing on lazy loading aspect but not lazy writing. However in order to do lazy loading, we need to have a good understanding of some of the Python’s magic methods. After that we try to learn writing lazy objects by doing an example: Writing a Redis client.

In the end, the complete version of that Redis client is presented in case you want to study its code and/or find it useful!


Lazy Loading 

Defer initialization of an object until the point at which it is needed.

There are arguments both pro and against lazy design:

Why lazy? 

  • Better performace (depending on your design)
  • Better illusion of performance (when dealing with heavy objects)
  • Less hits to your database (depending on your design)

Why not lazy? 

  • Inconsistent state.
  • Code complexity.
  • More hits to your database (depending on your design).

Why Lazy? 

Better performance, less hits to database 

Example: Django Queries

The act of creating a QuerySet does not involve any database activity. You can stack filters together all day long, and Django will not actually run the query until the QuerySet is evaluated

>>> q = Entry.objects.filter(headline__startswith="What")
>>> q = q.filter(pub_date__lte=datetime.date.today())
>>> q = q.exclude(body_text__icontains="food")
>>> print(q) # <-- evaluated here

Example: Redis get vs mget

Let’s benchmark hitting redis 100 times with get requests or combining 100 queries into one query and then getting all 100 items at once.

Basically the idea is that just like Django queries, you combine all queries into one big query and hit the db.

https://github.com/seperman/benchmark/blob/master/pyredis_benchmark.py

This is what I get on my machine hitting Redis local.

Fetching 100 keys from Redis
100 x Get one key: 4.84 miliseconds
1 x Mget 100 keys*: 0.93 miliseconds

In other words mget was 5 times faster.

*: mget is redis command for getting a bunch of string keys instead of get which gets one key.

Better illusion of performance 

Example:

  • Unlimited scroll. Faster page load time.
  • Load chunks of big graph as you need.

Less memory usage 

  • Load chunks of big graph as you need.

Why not lazy? 

Inconsistent state 

  • Adds another layer of abstraction thus more difficult to keep the states consistent.

More hits to your database 

(with a bad design)

for obj in all_lazy_objects:
    print(obj)  # evaluates one by one in a bad design.

Lazy loading is a design pattern. It fits certain use cases. Lets start with an example:

Code maintainability 

  • It can make the code slower to write or harder to maintain.

Case Study: Redis Client 

Learn by doing: Let’s create a Redis client that can simply fill up the values in a html template.

What we will learn:

  1. Why lazy loading is a good choice for writing a Redis client.
  2. How to write lazy objects in Python, which requires a good understanding of:
  3. Python Magic methods.

option 1: multiple get 

The first option is to have multiple get requests to Redis and then pass them through the context to the html template.

context = dict(
    title = redis.get('root.homepage.title'),
    x = redis.get('root.things.x'),
    y = redis.get('root.things.y'),
    z = redis.get('root.things.z'),
    footer = redis.get('root.footer')
)
<html>
<head>
<title>{{ title }}</title>
</head>
<body>
Hello,
We do {{ x }}, {{ y }} and {{ z }}.
{{ footer }}
</body>
</html>

We can do it better!

option 2: mget 

Mget is the redis command for multi get. It makes one request to Redis to get multiple strings. It can run way faster than making several get requests as we saw.

context = dict(
    title, x, y, z, footer = redis.mget('root.homepage.title',
                                        'root.things.x',
                                        'root.things.y',
                                        'root.things.z',
                                        'root.footer')
)
<html>
<head>
<title>{{ title }}</title>
</head>
<body>
Hello,
We do {{ x }}, {{ y }} and {{ z }}.
{{ footer }}
</body>
</html>

Note that get is for getting strings from Redis. For other types of data we need to use other commands. And in order to combine all these commands into one query, we can use the Redis’s pipleline feature.

option 3: lazy 

We will just pass a root object in the context to the template renderer. Then frontend guys decide which parts of the root children they need. Think about root as a huge dictionary that has every content you wanted to be cached in Redis in the first place. By passing the root to the html, the frontend can choose keys in the root to be used and only those keys will be loaded. This way you don’t have to think about what variables to pass into your context beforehand. All you need is passing a root object and later filling it up as you go.

context = {'root': root}
<html>
<head>
<title>{{ root.homepage.title }}</title>
</head>
<body>
Hello,
We do {{ root.things.x }}, {{ root.things.y }} and {{ root.things.z }}.

{{ root.footer }}
</body>
</html>

Redis data types overview 

If you are familiar with Redis data types and commands, just jump to the next section. Otherwise here is an overview of Redis data types and specific Redis commands for those types.

  • String get, mget, set, mset (m = multi)
  • List lrange (get a range), rpush (append), rpop (pop)
  • Hash (Dictionary) hset (set a key), hsetall (set a whole dict), hgetall (get a whole dict)
  • Set sadd, smembers

Our goal is to write a Redis client that automatically uses the correct Redis data type and command for the data we are assigning in Python.

In other words, typing root.var1="some text" should automatically use set and typing root.var2=[1,2,3] should automatically use rpush and etc.


Create a Redis client 

Step 1: Load strings 

Our goals:

  • Create a root object that print(root.something) in Python gets root.something value from Redis.
  • something could be any valid attribute name in Python
>>> print(root.something)
value of root.something in Redis

But before we can write this Redis client we need to have a better understanding of Python magic methods:

Python magic methods 

What are Python’s magic methods?

  • Special methods that you can define to add “magic” to your classes.
  • They all look like: __something__ and are pronounced dunder something.

__init__ 

(pronounced dunder init)

The first method to get called in an object’s instantiation? wrong!

__new__ 

The first method to get called in an object’s instantiation. It takes the class, then any other arguments that it will pass along to __init__.

__del__ 

Now you created object. What is run when you delete the object? You might think it is __del__ but that’s wrong. __del__ is run when garbage collected.

Here is a table of the more important ones:

Magic Methods in a glance 

Protocol for containers (to define containers like lists,…) Descriptor (custom class attributes that we fully control) Called when no attribute found Called whether the attribute is found or not Called when garbage collecting
__getitem__(self, key) __get__(self, obj, cls=None) __getattr__(self, name)
__setitem__(self, key, value) __set__(self, obj, val) __setattr__(self, name, value)
__delitem__(self, key) __delete__(self, obj) __delattr__(self, name) __del__(self) Not recommended. Instead use __exit__ (context manager)

Create Redis Client 

Now that we had an overview of Python magic methods, let’s get back to writing our client:

Step 1: load 

Goal:

>>> print(root.something)
value of root.something in Redis

__getattr__ to rescue 

Assuming we have the following 2 keys of root.something and root.anotherthing already set in Redis:

class Root:
    def __getattr__(self, key):
        class_name = self.__class__.__name__.lower() 
        return redis.get("{}.{}".format(class_name, key))

>>> print(root.something)
value of root.something in Redis

>>> print(root.anotherthing)
value of root.anotherthing in Redis

And it works!


Step 2: save 

Goal:

>>> root.something = "value"

And then we can verify it saved to Redis by using redis-cli:

$ redis-cli
127.0.0.1:6379> get root.something
"value"

First try:

class Root:
    def __init__(self):
        self.class_name = self.__class__.__name__.lower()

    def __getattr__(self, key):
        return redis.get("{}.{}".format(self.class_name, key))

    def __setattr__(self, key, value):
        redis.set(key, value)
        
root = Root()
root.something = "value"
print(root.something)

When we run print(root.something) we get:

    return redis.get("{}.{}".format(self.class_name, key))
RecursionError: maximum recursion depth exceeded while calling a Python object

Why do you think we are running into this error?

Hints:

  • When the maximum depth recursion error happens we have these values: key = "class_name" and there are 2 keys in Redis: class_nameand something
  • __getattr__ Called when no attribute found
  • __setattr__ Called whether the attribute is found or not

Ok let’s go over what happens step by step:

class Root:
    def __init__(self):
        self.class_name = self.__class__.__name__.lower()

    def __getattr__(self, key):
        return redis.get("{}.{}".format(self.class_name, key))

    def __setattr__(self, key, value):
        redis.set(key, value)
        
root = Root()
root.something = "value"
print(root.something)
1. __init__ runs.
2. self.class_name needs to be set.
3. __setattr__ is run to set self.class_name.
4. The key `class_name` is saved into Redis.
5. root.something = "value" sets key `something` into redis too.
6. print(root.something)
7. __getattr__ for `something` is run.
8. in order to get the key from Redis, it needs to get self.class_name.
9. But self.class_name was never set on the object. It was saved into Redis.
10. Tries to get self.class_name which needs self.class_name itself.

Save: The Solution 

  • Keep track of native attributes.
  • Save to and read from __dict__

You might ask what are native attributes? It is just a variable name that keeps track of what is native to our library vs. should be read or written to Redis. For example to make sure if root.classname is called, it should not save/read it to/from Redis but if root.somethingelse is called, it is not one of the native attributes and can be saved or read from Redis!

And here is an implementation that works:

NATIVE_ATTRIBUTES = {'class_name'}

class Root:
    def __init__(self):
        self.class_name = self.__class__.__name__.lower()

    def get_redis_key_path(self, key):
        return "{}.{}".format(self.class_name, key)

    def __getattr__(self, key):
        if key in NATIVE_ATTRIBUTES:
            return self.__dict__[key]
        else:
            key = self.get_redis_key_path(key)
            return redis.get(key)

    def __setattr__(self, key, value):
        if key in NATIVE_ATTRIBUTES:
            self.__dict__[key] = value
        else:
            key = self.get_redis_key_path(key)
            redis.set(key, value)

root = Root()
root.something = 10
print(root.something)  # b"10"

Step 3: Make it lazy 

Now that we have something that reads and write to Redis, our goal is to only evaluate the object when needed aka lazy load. In this case we want to evaluate when printing which means running the __str__ method of our object.

>>> obj = root.something
>>> obj  # runs __repr__
<Lazy root.something>
>>> print(root.something)  # runs __str__
b"10"

Implementation with lazy object:

NATIVE_ATTRIBUTES = {'class_name'}

class Root:
    def __init__(self):
        self.class_name = self.__class__.__name__.lower()

    def get_redis_key_path(self, key):
        return "{}.{}".format(self.class_name, key)

    def __getattr__(self, key):
        if key in NATIVE_ATTRIBUTES:
            return self.__dict__[key]
        else:
            key = self.get_redis_key_path(key)
            return Lazy(key)

    def __setattr__(self, key, value):
        if key in NATIVE_ATTRIBUTES:
            self.__dict__[key] = value
        else:
            key = self.get_redis_key_path(key)
            redis.set(key, value)

class Lazy:
    def __init__(self, key):
        self.key = key

    @property
    def value(self):
        # Redis keeps strings as bytes
        return redis.get(self.key).decode('utf-8')

    def __repr__(self):
        return "<Lazy {}>".format(self.key)

    def __str__(self):
        return self.value

And when we run it:

>>> root = Root()
>>> root.something = 10

>>> root.something  # does not evaluate the object yet since it runs __repr__
<Lazy root.something>
>>> print(root.something)  # evaluates since it runs __str__
b"10"

Now that we can set root.something to 10 what do you think happens when we run this?

>>> root = Root()
>>> root.something = 10
>>> root.something > 8

This is what we get:

>>> root = Root()
>>> root.something = 10
>>> root.something > 8
TypeError: unorderable types: Lazy() > int()

What magic method can help us make this lazy object comparable to numbers? Look at the full list here.

Yes, you guessed right: __gt__ (greater than) and __lt__ (less than):

class Lazy:
    def __init__(self, key):
        self.key = key

    @property
    def value(self):
        # Redis keeps strings as bytes
        return redis.get(self.key).decode('utf-8')

    def __repr__(self):
        return "<Lazy {}>".format(self.key)

    def __str__(self):
        return self.value

    def __gt__(self, other):
        return float(self.value) > other

    def __lt__(self, other):
        return float(self.value) < other
>>> root.something > 8
True
>>> root.something < 11
True

Great! Now what do you think happens if we run this? Is root.something equal to 10?

>>> root.something = 10
>>> root.something == 10

This is what we get:

>>> root.something = 10
>>> root.something == 10
False

In order to make the lazy object handle equality, we need to use __eq__. Using __lt__ and __gt__ is NOT enough to make Python do equality comparison!

class Lazy:
    def __init__(self, key):
        self.key = key

    @property
    def value(self):
        return redis.get(self.key).decode('utf-8')

    def __repr__(self):
        return "<Lazy {}>".format(self.key)

    def __str__(self):
        return self.value

    def __gt__(self, other):
        return float(self.value) > other

    def __lt__(self, other):
        return float(self.value) < other

    def __eq__(self, other):
        return float(self.value) == other
>>> root.something = 10
>>> root.something == 10
True

Deep thoughts: Equality vs. Identity 

Ok, so what do you think about this now? root.something is equal to 10. Is it 10?

>>> root.something = 10
>>> root.something == 10
True
>>> root.something is 10

And this is what we get:

>>> root.something = 10
>>> root.something == 10
True
>>> root.something is 10
False

The reason is that the is keyword deals with identity NOT equality. In python, 2 objects are identical only if their IDs are the same. In other words if they are actually the same object in memory.

So >>> root.something is 10 is equal to >>> id(root.something) == id(10)


Setting attributes of attributes 

Here is another interesting case. What if the key name in Redis is going to have more than one dot in it?

>>> root.something.another = 10
# No error!
>>> print(root.something.another)
# What do you think it is gonna print here?

>>> root.something.another = 10
# No error!
>>> print(root.something.another)
AttributeError: 'Lazy' object has no attribute 'another'

Solution:

  1. root.something is the lazy object
  2. root.something should return self for any non-native attribute
  3. Then update self.key from root.something to root.something.another

So basically if you ask for attribute of attribute of Root, we keep returning the same lazy object to you but internally change the key of that lazy object to have the new attribute in it.

Example:

  1. root.something will return a Lazy object. The lazy object’s internal key value is root.something at this point.
  2. root.something.another will return the same lazy object. The lazy object’s internal key value is root.something.another at this point.

If you are interested to actually see how setting/getting attributes of attributes works, I invite you to take a look at the dot module: https://github.com/seperman/dotobject.


Step 4: save lists 

Now that we can read and save strings to Redis, let’s try another data type: lists.

Redis has specific commands for specific data types. We need to use the right command to save lists.

Our goal is to simply set a list in Python and have it saved as a list in Redis.

root.sides = ["fries", "salad"]

This is our current implementation that doesn’t handle lists:

class Root:
    def __init__(self):
        self.class_name = self.__class__.__name__.lower()

    def get_redis_key_path(self, key):
        return "{}.{}".format(self.class_name, key)

    def __getattr__(self, key):
        if key in NATIVE_ATTRIBUTES:
            return self.__dict__[key]
        else:
            key = self.get_redis_key_path(key)
            return Lazy(key)

    def __setattr__(self, key, value):
        if key in NATIVE_ATTRIBUTES:
            self.__dict__[key] = value
        else:
            key = self.get_redis_key_path(key)
            redis.set(key, value)

And we add a couple of lines in __setattr__ to handle Iterables (lists are Iterables):

from collections import Iterable

class Root:
    def __init__(self):
        self.class_name = self.__class__.__name__.lower()

    def get_redis_key_path(self, key):
        return "{}.{}".format(self.class_name, key)

    def __getattr__(self, key):
        if key in NATIVE_ATTRIBUTES:
            return self.__dict__[key]
        else:
            key = self.get_redis_key_path(key)
            return Lazy(key)

    def __setattr__(self, key, value):
        if key in NATIVE_ATTRIBUTES:
            self.__dict__[key] = value
        else:
            key = self.get_redis_key_path(key)
            if isinstance(value, strings):
                redis.set(key, value)
            elif isinstance(value, Iterable):
                redis.delete(key)
                redis.rpush(key, *value)

And it works!

>>> root.sides = ["fries", "salad"]
$ redis-cli
127.0.0.1:6379> lrange root.sides 0 -1
1) "fries"
2) "salad"

Step 5: load lists 

Now that we can save lists to Redis, how about reading lists from Redis?

root.sides = ["fries", "salad"]
print(root.sides)
["fries", "salad"]

Our current implementation of lazy objects:

class Lazy:
    def __init__(self, key):
        self.key = key

    @property
    def value(self):
        return redis.get(self.key).decode('utf-8')

In order to read lists, we get the datatype from Redis and if it is list, then we use the proper command:

class Lazy:
    def __init__(self, key):
        self.key = key

    @property
    def value(self):
        thetype = redis.type(self.key)
        if thetype == b'string':
            result = redis.get(self.key).decode('utf-8')
        elif thetype == b'list':
            result = redis.lrange(self.key, 0, -1)
            result = [i.decode('utf-8') for i in result]
        return result

Yay! It works!

>>> root.sides = ["fries", "salad"]
print(root.sides)
"fries"

Let’s get some item from the list then.

>>> root.sides = ["fries", "salad"]
print(root.sides[0])
["fries", "salad"]
>>> print(root.sides[0])

Ooops!

>>> root.sides = ["fries", "salad"]
print(root.sides[0])
["fries", "salad"]
>>> print(root.sides[0])
TypeError: 'Lazy' object does not support indexing

Let’s look at the magic methods table to see how we can solve this problem:

Magic methods: setters and getters 

Protocol for containers (to define containers like lists,…) Descriptor (custom class attributes that we fully control) Called when no attribute found
__getitem__(self, key) __get__(self, obj, cls=None) __getattr__(self, name)
__setitem__(self, key, value) __set__(self, obj, val)
__delitem__(self, key) __delete__(self, obj)

Our current implementation:

class Lazy:
    def __init__(self, key):
        self.key = key

    @property
    def value(self):
        thetype = redis.type(self.key)
        if thetype == b'string':
            result = redis.get(self.key).decode('utf-8')
        elif thetype == b'list':
            result = redis.lrange(self.key, 0, -1)
            result = [i.decode('utf-8') for i in result]
        return result

Seems like what we need is using __getitem__:

class Lazy:
    def __init__(self, key):
        self.key = key

    @property
    def value(self):
        thetype = redis.type(self.key)
        if thetype == b'string':
            result = redis.get(self.key).decode('utf-8')
        elif thetype == b'list':
            result = redis.lrange(self.key, 0, -1)
            result = [i.decode('utf-8') for i in result]
        return result

    def __getitem__(self, key):
        return self.value[key]

Let’s run this:

>>> root.sides = ["fries", "salad"]
print(root.sides[0])
["fries", "salad"]
>>> print(root.sides[0])
fries

Yes!

RedisWorks 

As you can see it can get complicated. And I don’t want to bother you with every edge case and data type that is out there. That’s why I wrote Redisworks and it readu to use. Redisworks is basically what we created through this article plus it handles other data types and edge cases. It should be relatively easy to read the source code for you now that you understand basically what it does!

Based on DotObject and PyRedis.

pip install redisworks

https://github.com/seperman/redisworks

  • Lazy Redis Queries
  • Multi Query evaluation
  • Dynamic Typing
  • Ease of use

Here are some examples of how dynamic typing makes life easier:

Strings 

PyRedis

>>> from redis import StrictRedis
>>> redis = StrictRedis()
>>> redis.set("root.something", "value")

RedisWorks

>>> from redisworks import Root
>>> root=Root()
>>> root.something = "value"

Lists with different data types 

PyRedis

In PyRedis you get back everything as string:

>>> redis.rpush("root.sides", 10, "root.something", "value")
>>> values = redis.lrange("root.sides", 0, -1)
>>> values
[b'10', b'root.something', b'value']

RedisWorks

>>> root.sides = [10, "fries", "coke"]
>>> root.sides[1]
'fries'
>>> "fries" in root.sides
True
>>> type(root.sides[0])
int

Nested lists 

PyRedis

In PyRedis the nested list is returned serialized.

>>> values = [10, [1, 2]]
>>> redis.rpush("root.sides", *values)
2
>>> redis.lrange("root.sides", 0, -1)
[b'10', b'[1, 2]']

RedisWorks

>>> root.sides = [10, [1, 2]]
>>> root.sides
[10, [1, 2]]
>>> type(root.sides[1])
<class 'list'>

Dictionaries 

PyRedis

Do you really want to remember hmset, hgetall?

>>> redis.hmset("root.something", {1:"a", "b": {2: 2}})
>>> val = redis.hgetall("root.something")
>>> val
{b'b': b'{2: 2}', b'1': b'a'}
>>> val[b'b']
b'{2: 2}'

RedisWorks

>>> root.something = {1:"a", "b": {2: 2}}
>>> root.something
{'b': {2: 2}, 1: 'a'}
>>> root.something['b'][2]
2

Bottom-line 

Lazy loading is a design pattern that fits certain use cases. We studied how writing a Redis client can fit a good use case for lazy objects and we learned how to write lazy objects. In order to write lazy objects, you need to understand Python magic methods. These methods might be magic to the users of your code but it should not be magic to you.

Why did I write this article?

This is basically a talk I gave at SoCal Python. I used the opportunity to write a Redis client that I wanted to use it myself and gave a talk about what I learnt writing it.

pip install redisworks

https://github.com/seperman/redisworks

Hope you enjoyed reading this.

The PDF of this talk is available here.

See Also

Amazon S3 made simple for Python apps.

From time to time I need to do some simple file/folder transfer/query from a Python app to a S3 bucket. There is a great Python library called Boto that offers a comprehensive low level interface to AWS. I consider it too low level for simple operations on S3. On the other hand there is a fantastic command line library called s3cmd written in Python but it is designed to be used from command line and not from Python apps. Every other Python library I found to be dealing with S3 buckets, seemed unmaintained.

Read More
Diff It To Digg It

Diff It To Digg It

Anybody who has used git diff will know that your life is not the same once you start diffing. When you get to the habit, there is no going back. Now let’s look at diff for structured data!

Read More