Why Distributed Task Queues?
- Offload long jobs to background processes example: video conversion
- Offload too many [small] jobs to background processes example: commenting system
- Keep track of jobs, monitor, auto-restart
- Schedule jobs
- Replace cron jobs. Cron jobs can run maximum once a minute. That is a limitation. And they don’t have the best interface.
Message Queue vs. Task Queue
- Message Queue are the basic functionality of passing, holding, and delivering messages Example: Redis, RabbitMQ
- Tasks Queue manage work to be done and is considered a type of message queue Example: Celery
Distributed Task Queus
Based on Popularity
|Solution||Stars on git||Downloads/mo|
|RQ (Redis Queue)||2600||40,000|
|⤿ Django RQ||428||13,000|
|MrQ (Mr. Q)||340||5,000|
Who uses them? Everyone.
|Celery||Instagram, Mozilla, Truecar|
|RQ (Redis Queue)||?|
|⤿ Django RQ||?|
|MrQ (Mr. Q)||Pricing Assistant (creator)|
Image from parallel programming in Python book
Client ⥂ Redis ⥄ Worker
Celery Task Example
from celery import Celery app = Celery('tasks', broker='amqp://guest@localhost//') @app.task def add(x, y): import time; sleep(5*60) return x + y ------------------ >>> from tasks import add >>> result = add.delay(4, 4) >>> result.ready() False 5 minutes later: >>> result.ready() True
RQ Task example
from rq import Queue from redis import Redis from somewhere import count_words_at_url # Tell RQ what Redis connection to use redis_conn = Redis() # no args implies the default queue q = Queue(connection=redis_conn) # Delay execution of count_words_at_url('http://nvie.com') job = q.enqueue(count_words_at_url, 'http://nvie.com') print job.result # => None # Now, wait a while, until the worker is finished time.sleep(2) print job.result # => 889
- Celery - Flower
- RQ - Dashboard
Celery monitoring: Flower
RQ Monitoring: RQ Dashboard
MRQ Monitoring: MRQ Dashboard
Celery vs. RQ - overview
|Complexity of code||Very complicated||Easy to understand|
|Documentation||Take a while to read||Simple|
|message brokers||RabbitMQ, Redis, MongoDB||Redis|
|result backends||RabbitMQ, Redis, Memcached, MongoDB, Cassandra,…||Redis|
|Concurrency||Master-Slave processes||supervisord() + fork|
|Language||Can send tasks from one language to another||Only Python|
|Subtasks||Can create tasks within tasks||Nope|
Why use RQ?
It all comes down to simplicity. and… MEMORY LEAK
Redis Memory Leak
- Celery has Memory Leak issues
- Some memory leak can happen with older broker libraries. i.e. librabbitmq
- Celery monitor (Flower) has huge memory leak.
- RQ offers less but its memory leak should be much smaller than Celery (not verified it myself)
Why use Celery?
Use Celery when:
- RQ limits you to use Redis both as message broker and result backend. If you need another broker/result backend.
- Redis can drop messages. But it will pick it up later. If that bothers you, you can’t use RQ.
- Celery is way more feature rich and flexible than RQ.
- If you don’t ever really need to know the magic behind the scene in Celery.
How complex is RQ:
How complex is Celery:
And in case you were wondering…
How complex is Django:
So if you can understand Django source code, you should be able to understand Celery’s too.