RedisWorks, the Pythonic Redis Client.
Introducing RedisWorks. How to get more from Redis with less coding.
Read MorePublished on Sep 10, 2016 by Sep Dehpour
This is based on a talk I gave at SoCal Python meetup. The PDF of this talk is available here.
All the code is run in Python 3.5 but it works with Python 2 too.
When it comes to performance, lazy loading can help a lot. I remember when I first started using Django and Django’s lazy queries were magic to me.
In this article I will try to dimistify lazy objects in Python, mainly focusing on lazy loading aspect but not lazy writing. However in order to do lazy loading, we need to have a good understanding of some of the Python’s magic methods. After that we try to learn writing lazy objects by doing an example: Writing a Redis client.
In the end, the complete version of that Redis client is presented in case you want to study its code and/or find it useful!
Defer initialization of an object until the point at which it is needed.
There are arguments both pro and against lazy design:
The act of creating a QuerySet does not involve any database activity. You can stack filters together all day long, and Django will not actually run the query until the QuerySet is evaluated
>>> q = Entry.objects.filter(headline__startswith="What")
>>> q = q.filter(pub_date__lte=datetime.date.today())
>>> q = q.exclude(body_text__icontains="food")
>>> print(q) # <-- evaluated here
Let’s benchmark hitting redis 100 times with get requests or combining 100 queries into one query and then getting all 100 items at once.
Basically the idea is that just like Django queries, you combine all queries into one big query and hit the db.
https://github.com/seperman/benchmark/blob/master/pyredis_benchmark.py
This is what I get on my machine hitting Redis local.
Fetching 100 keys from Redis |
---|
100 x Get one key: 4.84 miliseconds |
1 x Mget 100 keys*: 0.93 miliseconds |
In other words mget was 5 times faster.
*: mget
is redis command for getting a bunch of string keys instead of get
which gets one key.
Example:
(with a bad design)
for obj in all_lazy_objects:
print(obj) # evaluates one by one in a bad design.
Lazy loading is a design pattern. It fits certain use cases. Lets start with an example:
Learn by doing: Let’s create a Redis client that can simply fill up the values in a html template.
What we will learn:
The first option is to have multiple get requests to Redis and then pass them through the context to the html template.
context = dict(
title = redis.get('root.homepage.title'),
x = redis.get('root.things.x'),
y = redis.get('root.things.y'),
z = redis.get('root.things.z'),
footer = redis.get('root.footer')
)
<html>
<head>
<title>{{ title }}</title>
</head>
<body>
Hello,
We do {{ x }}, {{ y }} and {{ z }}.
{{ footer }}
</body>
</html>
We can do it better!
Mget is the redis command for multi get. It makes one request to Redis to get multiple strings. It can run way faster than making several get requests as we saw.
context = dict(
title, x, y, z, footer = redis.mget('root.homepage.title',
'root.things.x',
'root.things.y',
'root.things.z',
'root.footer')
)
<html>
<head>
<title>{{ title }}</title>
</head>
<body>
Hello,
We do {{ x }}, {{ y }} and {{ z }}.
{{ footer }}
</body>
</html>
Note that get
is for getting strings from Redis. For other types of data we need to use other commands. And in order to combine all these commands into one query, we can use the Redis’s pipleline feature.
We will just pass a root
object in the context to the template renderer. Then frontend guys decide which parts of the root children they need. Think about root as a huge dictionary that has every content you wanted to be cached in Redis in the first place. By passing the root to the html, the frontend can choose keys in the root to be used and only those keys will be loaded. This way you don’t have to think about what variables to pass into your context beforehand. All you need is passing a root
object and later filling it up as you go.
context = {'root': root}
<html>
<head>
<title>{{ root.homepage.title }}</title>
</head>
<body>
Hello,
We do {{ root.things.x }}, {{ root.things.y }} and {{ root.things.z }}.
{{ root.footer }}
</body>
</html>
If you are familiar with Redis data types and commands, just jump to the next section. Otherwise here is an overview of Redis data types and specific Redis commands for those types.
get
, mget
, set
, mset (m = multi)
lrange (get a range)
, rpush (append)
, rpop (pop)
hset (set a key)
, hsetall (set a whole dict)
, hgetall (get a whole dict)
sadd
, smembers
Our goal is to write a Redis client that automatically uses the correct Redis data type and command for the data we are assigning in Python.
In other words, typing root.var1="some text"
should automatically use set
and typing root.var2=[1,2,3]
should automatically use rpush
and etc.
Our goals:
print(root.something)
in Python gets root.something
value from Redis.something
could be any valid attribute name in Python>>> print(root.something)
value of root.something in Redis
But before we can write this Redis client we need to have a better understanding of Python magic methods:
What are Python’s magic methods?
__something__
and are pronounced dunder something.__init__
(pronounced dunder init)
The first method to get called in an object’s instantiation? wrong!
__new__
The first method to get called in an object’s instantiation. It takes the class, then any other arguments that it will pass along to __init__
.
__del__
Now you created object. What is run when you delete the object?
You might think it is __del__
but that’s wrong. __del__
is run when garbage collected.
Here is a table of the more important ones:
Protocol for containers (to define containers like lists,…) | Descriptor (custom class attributes that we fully control) | Called when no attribute found | Called whether the attribute is found or not | Called when garbage collecting |
---|---|---|---|---|
__getitem__(self, key) |
__get__(self, obj, cls=None) |
__getattr__(self, name) |
||
__setitem__(self, key, value) |
__set__(self, obj, val) |
__setattr__(self, name, value) |
||
__delitem__(self, key) |
__delete__(self, obj) |
__delattr__(self, name) |
__del__(self) Not recommended. Instead use __exit__ (context manager) |
Now that we had an overview of Python magic methods, let’s get back to writing our client:
Goal:
>>> print(root.something)
value of root.something in Redis
__getattr__
to rescue Assuming we have the following 2 keys of root.something
and root.anotherthing
already set in Redis:
class Root:
def __getattr__(self, key):
class_name = self.__class__.__name__.lower()
return redis.get("{}.{}".format(class_name, key))
>>> print(root.something)
value of root.something in Redis
>>> print(root.anotherthing)
value of root.anotherthing in Redis
And it works!
Goal:
>>> root.something = "value"
And then we can verify it saved to Redis by using redis-cli:
$ redis-cli
127.0.0.1:6379> get root.something
"value"
First try:
class Root:
def __init__(self):
self.class_name = self.__class__.__name__.lower()
def __getattr__(self, key):
return redis.get("{}.{}".format(self.class_name, key))
def __setattr__(self, key, value):
redis.set(key, value)
root = Root()
root.something = "value"
print(root.something)
When we run print(root.something)
we get:
return redis.get("{}.{}".format(self.class_name, key))
RecursionError: maximum recursion depth exceeded while calling a Python object
Why do you think we are running into this error?
Hints:
key = "class_name"
and there are 2 keys in Redis: class_name
and something
__getattr__
Called when no attribute found__setattr__
Called whether the attribute is found or notOk let’s go over what happens step by step:
class Root:
def __init__(self):
self.class_name = self.__class__.__name__.lower()
def __getattr__(self, key):
return redis.get("{}.{}".format(self.class_name, key))
def __setattr__(self, key, value):
redis.set(key, value)
root = Root()
root.something = "value"
print(root.something)
1. __init__ runs.
2. self.class_name needs to be set.
3. __setattr__ is run to set self.class_name.
4. The key `class_name` is saved into Redis.
5. root.something = "value" sets key `something` into redis too.
6. print(root.something)
7. __getattr__ for `something` is run.
8. in order to get the key from Redis, it needs to get self.class_name.
9. But self.class_name was never set on the object. It was saved into Redis.
10. Tries to get self.class_name which needs self.class_name itself.
__dict__
You might ask what are native attributes? It is just a variable name that keeps track of what is native to our library vs. should be read or written to Redis. For example to make sure if root.classname
is called, it should not save/read it to/from Redis but if root.somethingelse
is called, it is not one of the native attributes and can be saved or read from Redis!
And here is an implementation that works:
NATIVE_ATTRIBUTES = {'class_name'}
class Root:
def __init__(self):
self.class_name = self.__class__.__name__.lower()
def get_redis_key_path(self, key):
return "{}.{}".format(self.class_name, key)
def __getattr__(self, key):
if key in NATIVE_ATTRIBUTES:
return self.__dict__[key]
else:
key = self.get_redis_key_path(key)
return redis.get(key)
def __setattr__(self, key, value):
if key in NATIVE_ATTRIBUTES:
self.__dict__[key] = value
else:
key = self.get_redis_key_path(key)
redis.set(key, value)
root = Root()
root.something = 10
print(root.something) # b"10"
Now that we have something that reads and write to Redis, our goal is to only evaluate the object when needed aka lazy load. In this case we want to evaluate when printing which means running the __str__
method of our object.
>>> obj = root.something
>>> obj # runs __repr__
<Lazy root.something>
>>> print(root.something) # runs __str__
b"10"
Implementation with lazy object:
NATIVE_ATTRIBUTES = {'class_name'}
class Root:
def __init__(self):
self.class_name = self.__class__.__name__.lower()
def get_redis_key_path(self, key):
return "{}.{}".format(self.class_name, key)
def __getattr__(self, key):
if key in NATIVE_ATTRIBUTES:
return self.__dict__[key]
else:
key = self.get_redis_key_path(key)
return Lazy(key)
def __setattr__(self, key, value):
if key in NATIVE_ATTRIBUTES:
self.__dict__[key] = value
else:
key = self.get_redis_key_path(key)
redis.set(key, value)
class Lazy:
def __init__(self, key):
self.key = key
@property
def value(self):
# Redis keeps strings as bytes
return redis.get(self.key).decode('utf-8')
def __repr__(self):
return "<Lazy {}>".format(self.key)
def __str__(self):
return self.value
And when we run it:
>>> root = Root()
>>> root.something = 10
>>> root.something # does not evaluate the object yet since it runs __repr__
<Lazy root.something>
>>> print(root.something) # evaluates since it runs __str__
b"10"
Now that we can set root.something to 10 what do you think happens when we run this?
>>> root = Root()
>>> root.something = 10
>>> root.something > 8
This is what we get:
>>> root = Root()
>>> root.something = 10
>>> root.something > 8
TypeError: unorderable types: Lazy() > int()
What magic method can help us make this lazy object comparable to numbers? Look at the full list here.
…
Yes, you guessed right: __gt__
(greater than) and __lt__
(less than):
class Lazy:
def __init__(self, key):
self.key = key
@property
def value(self):
# Redis keeps strings as bytes
return redis.get(self.key).decode('utf-8')
def __repr__(self):
return "<Lazy {}>".format(self.key)
def __str__(self):
return self.value
def __gt__(self, other):
return float(self.value) > other
def __lt__(self, other):
return float(self.value) < other
>>> root.something > 8
True
>>> root.something < 11
True
Great! Now what do you think happens if we run this? Is root.something
equal to 10?
>>> root.something = 10
>>> root.something == 10
This is what we get:
>>> root.something = 10
>>> root.something == 10
False
In order to make the lazy object handle equality, we need to use __eq__
. Using __lt__
and __gt__
is NOT enough to make Python do equality comparison!
class Lazy:
def __init__(self, key):
self.key = key
@property
def value(self):
return redis.get(self.key).decode('utf-8')
def __repr__(self):
return "<Lazy {}>".format(self.key)
def __str__(self):
return self.value
def __gt__(self, other):
return float(self.value) > other
def __lt__(self, other):
return float(self.value) < other
def __eq__(self, other):
return float(self.value) == other
>>> root.something = 10
>>> root.something == 10
True
Ok, so what do you think about this now? root.something
is equal to 10. Is it 10?
>>> root.something = 10
>>> root.something == 10
True
>>> root.something is 10
And this is what we get:
>>> root.something = 10
>>> root.something == 10
True
>>> root.something is 10
False
The reason is that the is keyword
deals with identity NOT equality. In python, 2 objects are identical only if their IDs are the same. In other words if they are actually the same object in memory.
So >>> root.something is 10
is equal to >>> id(root.something) == id(10)
Here is another interesting case. What if the key name in Redis is going to have more than one dot in it?
>>> root.something.another = 10
# No error!
>>> print(root.something.another)
# What do you think it is gonna print here?
>>> root.something.another = 10
# No error!
>>> print(root.something.another)
AttributeError: 'Lazy' object has no attribute 'another'
Solution:
root.something
to root.something.another
So basically if you ask for attribute of attribute of Root, we keep returning the same lazy object to you but internally change the key of that lazy object to have the new attribute in it.
Example:
root.something
at this point.root.something.another
at this point.If you are interested to actually see how setting/getting attributes of attributes works, I invite you to take a look at the dot module: https://github.com/seperman/dotobject.
Now that we can read and save strings to Redis, let’s try another data type: lists.
Redis has specific commands for specific data types. We need to use the right command to save lists.
Our goal is to simply set a list in Python and have it saved as a list in Redis.
root.sides = ["fries", "salad"]
This is our current implementation that doesn’t handle lists:
class Root:
def __init__(self):
self.class_name = self.__class__.__name__.lower()
def get_redis_key_path(self, key):
return "{}.{}".format(self.class_name, key)
def __getattr__(self, key):
if key in NATIVE_ATTRIBUTES:
return self.__dict__[key]
else:
key = self.get_redis_key_path(key)
return Lazy(key)
def __setattr__(self, key, value):
if key in NATIVE_ATTRIBUTES:
self.__dict__[key] = value
else:
key = self.get_redis_key_path(key)
redis.set(key, value)
And we add a couple of lines in __setattr__
to handle Iterables (lists are Iterables):
from collections import Iterable
class Root:
def __init__(self):
self.class_name = self.__class__.__name__.lower()
def get_redis_key_path(self, key):
return "{}.{}".format(self.class_name, key)
def __getattr__(self, key):
if key in NATIVE_ATTRIBUTES:
return self.__dict__[key]
else:
key = self.get_redis_key_path(key)
return Lazy(key)
def __setattr__(self, key, value):
if key in NATIVE_ATTRIBUTES:
self.__dict__[key] = value
else:
key = self.get_redis_key_path(key)
if isinstance(value, strings):
redis.set(key, value)
elif isinstance(value, Iterable):
redis.delete(key)
redis.rpush(key, *value)
And it works!
>>> root.sides = ["fries", "salad"]
$ redis-cli
127.0.0.1:6379> lrange root.sides 0 -1
1) "fries"
2) "salad"
Now that we can save lists to Redis, how about reading lists from Redis?
root.sides = ["fries", "salad"]
print(root.sides)
["fries", "salad"]
Our current implementation of lazy objects:
class Lazy:
def __init__(self, key):
self.key = key
@property
def value(self):
return redis.get(self.key).decode('utf-8')
In order to read lists, we get the datatype from Redis and if it is list, then we use the proper command:
class Lazy:
def __init__(self, key):
self.key = key
@property
def value(self):
thetype = redis.type(self.key)
if thetype == b'string':
result = redis.get(self.key).decode('utf-8')
elif thetype == b'list':
result = redis.lrange(self.key, 0, -1)
result = [i.decode('utf-8') for i in result]
return result
Yay! It works!
>>> root.sides = ["fries", "salad"]
print(root.sides)
"fries"
Let’s get some item from the list then.
>>> root.sides = ["fries", "salad"]
print(root.sides[0])
["fries", "salad"]
>>> print(root.sides[0])
Ooops!
>>> root.sides = ["fries", "salad"]
print(root.sides[0])
["fries", "salad"]
>>> print(root.sides[0])
TypeError: 'Lazy' object does not support indexing
Let’s look at the magic methods table to see how we can solve this problem:
Protocol for containers (to define containers like lists,…) | Descriptor (custom class attributes that we fully control) | Called when no attribute found |
---|---|---|
__getitem__(self, key) |
__get__(self, obj, cls=None) |
__getattr__(self, name) |
__setitem__(self, key, value) |
__set__(self, obj, val) |
|
__delitem__(self, key) |
__delete__(self, obj) |
Our current implementation:
class Lazy:
def __init__(self, key):
self.key = key
@property
def value(self):
thetype = redis.type(self.key)
if thetype == b'string':
result = redis.get(self.key).decode('utf-8')
elif thetype == b'list':
result = redis.lrange(self.key, 0, -1)
result = [i.decode('utf-8') for i in result]
return result
Seems like what we need is using __getitem__
:
class Lazy:
def __init__(self, key):
self.key = key
@property
def value(self):
thetype = redis.type(self.key)
if thetype == b'string':
result = redis.get(self.key).decode('utf-8')
elif thetype == b'list':
result = redis.lrange(self.key, 0, -1)
result = [i.decode('utf-8') for i in result]
return result
def __getitem__(self, key):
return self.value[key]
Let’s run this:
>>> root.sides = ["fries", "salad"]
print(root.sides[0])
["fries", "salad"]
>>> print(root.sides[0])
fries
Yes!
As you can see it can get complicated. And I don’t want to bother you with every edge case and data type that is out there. That’s why I wrote Redisworks and it readu to use. Redisworks is basically what we created through this article plus it handles other data types and edge cases. It should be relatively easy to read the source code for you now that you understand basically what it does!
Based on DotObject and PyRedis.
pip install redisworks
https://github.com/seperman/redisworks
Here are some examples of how dynamic typing makes life easier:
PyRedis
>>> from redis import StrictRedis
>>> redis = StrictRedis()
>>> redis.set("root.something", "value")
RedisWorks
>>> from redisworks import Root
>>> root=Root()
>>> root.something = "value"
PyRedis
In PyRedis you get back everything as string:
>>> redis.rpush("root.sides", 10, "root.something", "value")
>>> values = redis.lrange("root.sides", 0, -1)
>>> values
[b'10', b'root.something', b'value']
RedisWorks
>>> root.sides = [10, "fries", "coke"]
>>> root.sides[1]
'fries'
>>> "fries" in root.sides
True
>>> type(root.sides[0])
int
PyRedis
In PyRedis the nested list is returned serialized.
>>> values = [10, [1, 2]]
>>> redis.rpush("root.sides", *values)
2
>>> redis.lrange("root.sides", 0, -1)
[b'10', b'[1, 2]']
RedisWorks
>>> root.sides = [10, [1, 2]]
>>> root.sides
[10, [1, 2]]
>>> type(root.sides[1])
<class 'list'>
PyRedis
Do you really want to remember hmset, hgetall?
>>> redis.hmset("root.something", {1:"a", "b": {2: 2}})
>>> val = redis.hgetall("root.something")
>>> val
{b'b': b'{2: 2}', b'1': b'a'}
>>> val[b'b']
b'{2: 2}'
RedisWorks
>>> root.something = {1:"a", "b": {2: 2}}
>>> root.something
{'b': {2: 2}, 1: 'a'}
>>> root.something['b'][2]
2
Lazy loading is a design pattern that fits certain use cases. We studied how writing a Redis client can fit a good use case for lazy objects and we learned how to write lazy objects. In order to write lazy objects, you need to understand Python magic methods. These methods might be magic to the users of your code but it should not be magic to you.
Why did I write this article?
This is basically a talk I gave at SoCal Python. I used the opportunity to write a Redis client that I wanted to use it myself and gave a talk about what I learnt writing it.
pip install redisworks
https://github.com/seperman/redisworks
Hope you enjoyed reading this.
The PDF of this talk is available here.
Introducing RedisWorks. How to get more from Redis with less coding.
Read MoreFrom time to time I need to do some simple file/folder transfer/query from a Python app to a S3 bucket. There is a great Python library called Boto that offers a comprehensive low level interface to AWS. I consider it too low level for simple operations on S3. On the other hand there is a fantastic command line library called s3cmd written in Python but it is designed to be used from command line and not from Python apps. Every other Python library I found to be dealing with S3 buckets, seemed unmaintained.
Read MoreAnybody who has used git diff will know that your life is not the same once you start diffing. When you get to the habit, there is no going back. Now let’s look at diff for structured data!
Read More