Why Distributed Task Queues?
- Offload long jobs to background processes example: video conversion
- Offload too many [small] jobs to background processes example: commenting system
- Keep track of jobs, monitor, auto-restart
- Schedule jobs
- Replace cron jobs. Cron jobs can run maximum once a minute. That is a limitation. And they don’t have the best interface.
Message Queue vs. Task Queue
- Message Queue are the basic functionality of passing, holding, and delivering messages Example: Redis, RabbitMQ
- Tasks Queue manage work to be done and is considered a type of message queue Example: Celery
Distributed Task Queus
in Python
Based on Popularity
Solution | Stars on git | Downloads/mo |
---|---|---|
Celery | 4600 | 400,000 |
RQ (Redis Queue) | 2600 | 40,000 |
⤿ Django RQ | 428 | 13,000 |
Huey | 824 | 3,000 |
MrQ (Mr. Q) | 340 | 5,000 |
Taskmaster | 346 | 1,000 |
Who uses them? Everyone.
Solution | User |
---|---|
Celery | Instagram, Mozilla, Truecar |
RQ (Redis Queue) | ? |
⤿ Django RQ | ? |
Huey | ? |
MrQ (Mr. Q) | Pricing Assistant (creator) |
Taskmaster | Disqus (creator) |
Celery Architecture
Image from parallel programming in Python book
RQ Architecture
Client ⥂ Redis ⥄ Worker
Celery Task Example
from celery import Celery
app = Celery('tasks', broker='amqp://guest@localhost//')
@app.task
def add(x, y):
import time; sleep(5*60)
return x + y
------------------
>>> from tasks import add
>>> result = add.delay(4, 4)
>>> result.ready()
False
5 minutes later:
>>> result.ready()
True
RQ Task example
from rq import Queue
from redis import Redis
from somewhere import count_words_at_url
# Tell RQ what Redis connection to use
redis_conn = Redis()
# no args implies the default queue
q = Queue(connection=redis_conn)
# Delay execution of count_words_at_url('http://nvie.com')
job = q.enqueue(count_words_at_url, 'http://nvie.com')
print job.result # => None
# Now, wait a while, until the worker is finished
time.sleep(2)
print job.result # => 889
Monitoring
- Celery - Flower
- RQ - Dashboard
Celery monitoring: Flower
Workers
Tasks
CPU usage
RQ Monitoring: RQ Dashboard
MRQ Monitoring: MRQ Dashboard
Celery vs. RQ - overview
Celery | RQ | |
---|---|---|
Complexity of code | Very complicated | Easy to understand |
Documentation | Take a while to read | Simple |
Monitoring | Flower | RQ Dashboard |
message brokers | RabbitMQ, Redis, MongoDB | Redis |
result backends | RabbitMQ, Redis, Memcached, MongoDB, Cassandra,… | Redis |
Concurrency | Master-Slave processes | supervisord() + fork |
Scheduler | Celerybeats | 3rd party |
Language | Can send tasks from one language to another | Only Python |
Subtasks | Can create tasks within tasks | Nope |
Django support | Built-in | Django-rq |
Why use RQ?
It all comes down to simplicity. and… MEMORY LEAK
Redis Memory Leak
- Celery has Memory Leak issues
- Some memory leak can happen with older broker libraries. i.e. librabbitmq
- Celery monitor (Flower) has huge memory leak.
- RQ offers less but its memory leak should be much smaller than Celery (not verified it myself)
Why use Celery?
Use Celery when:
- RQ limits you to use Redis both as message broker and result backend. If you need another broker/result backend.
- Redis can drop messages. But it will pick it up later. If that bothers you, you can’t use RQ.
- Celery is way more feature rich and flexible than RQ.
- If you don’t ever really need to know the magic behind the scene in Celery.
Complexity
How complex is RQ:
How complex is Celery:
And in case you were wondering…
How complex is Django:
So if you can understand Django source code, you should be able to understand Celery’s too.